linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
@ 2023-01-18  6:12 Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
                   ` (14 more replies)
  0 siblings, 15 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This patch series provides a new UAPI for the Nouveau driver in order to
support Vulkan features, such as sparse bindings and sparse residency.

Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
keep track of GPU virtual address (VA) mappings in a more generic way.

The DRM GPUVA manager is indented to help drivers implement userspace-manageable
GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
serves the following purposes in this context.

    1) Provide a dedicated range allocator to track GPU VA allocations and
       mappings, making use of the drm_mm range allocator.

    2) Generically connect GPU VA mappings to their backing buffers, in
       particular DRM GEM objects.

    3) Provide a common implementation to perform more complex mapping
       operations on the GPU VA space. In particular splitting and merging
       of GPU VA mappings, e.g. for intersecting mapping requests or partial
       unmap requests.

The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
providing the following new interfaces.

    1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
       for UMDs to specify the portion of VA space managed by the kernel and
       userspace, respectively.

    2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

    3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
scheduler to queue jobs and support asynchronous processing with DRM syncobjs
as synchronization mechanism.

By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
for GEM buffers) by Christian König. Since the patch implementing drm_exec was
not yet merged into drm-next it is part of this series, as well as a small fix
for this patch, which was found while testing this series.

This patch series is also available at [1].

There is a Mesa NVK merge request by Dave Airlie [2] implementing the
corresponding userspace parts for this series.

The Vulkan CTS test suite passes the sparse binding and sparse residency test
cases for the new UAPI together with Dave's Mesa work.

There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
VA manager's logic through Nouveau's new UAPI and should be considered just as
helper for implementation.

However, I absolutely intend to change those test cases to proper kunit test
cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
design.

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
    https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
[2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
[3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind

I also want to give credit to Dave Airlie, who contributed a lot of ideas to
this patch series.

Christian König (1):
  drm: execution context for GEM buffers

Danilo Krummrich (13):
  drm/exec: fix memory leak in drm_exec_prepare_obj()
  drm: manager to keep track of GPUs VA mappings
  drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  drm/nouveau: new VM_BIND uapi interfaces
  drm/nouveau: get vmm via nouveau_cli_vmm()
  drm/nouveau: bo: initialize GEM GPU VA interface
  drm/nouveau: move usercopy helpers to nouveau_drv.h
  drm/nouveau: fence: fail to emit when fence context is killed
  drm/nouveau: chan: provide nouveau_channel_kill()
  drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  drm/nouveau: implement uvmm for user mode bindings
  drm/nouveau: implement new VM_BIND UAPI
  drm/nouveau: debugfs: implement DRM GPU VA debugfs

 Documentation/gpu/driver-uapi.rst             |   11 +
 Documentation/gpu/drm-mm.rst                  |   43 +
 drivers/gpu/drm/Kconfig                       |    6 +
 drivers/gpu/drm/Makefile                      |    3 +
 drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
 drivers/gpu/drm/drm_debugfs.c                 |   56 +
 drivers/gpu/drm/drm_exec.c                    |  294 ++++
 drivers/gpu/drm/drm_gem.c                     |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
 drivers/gpu/drm/nouveau/Kbuild                |    3 +
 drivers/gpu/drm/nouveau/Kconfig               |    2 +
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
 drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
 drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
 drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
 drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
 drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
 drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
 drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
 drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
 include/drm/drm_debugfs.h                     |   25 +
 include/drm/drm_drv.h                         |    6 +
 include/drm/drm_exec.h                        |  144 ++
 include/drm/drm_gem.h                         |   75 +
 include/drm/drm_gpuva_mgr.h                   |  527 +++++++
 include/uapi/drm/nouveau_drm.h                |  216 +++
 47 files changed, 5266 insertions(+), 126 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
 create mode 100644 include/drm/drm_exec.h
 create mode 100644 include/drm/drm_gpuva_mgr.h


base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
-- 
2.39.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH drm-next 01/14] drm: execution context for GEM buffers
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

From: Christian König <christian.koenig@amd.com>

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measureable.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst       |  12 ++
 drivers/gpu/drm/Kconfig            |   6 +
 drivers/gpu/drm/Makefile           |   2 +
 drivers/gpu/drm/amd/amdgpu/Kconfig |   1 +
 drivers/gpu/drm/drm_exec.c         | 295 +++++++++++++++++++++++++++++
 include/drm/drm_exec.h             | 144 ++++++++++++++
 6 files changed, 460 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..a52e6f4117d6 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 748b93d00184..05134256da59 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -200,6 +200,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 496fa5a6147a..4fe190aee584 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 5341b6b242c3..279fb3bba810 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -11,6 +11,7 @@ config DRM_AMDGPU
 	select DRM_SCHED
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select POWER_SUPPLY
 	select HWMON
 	select I2C
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..ed2106c22786
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,295 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_lock(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Initialize the drm_exec_objects container */
+static void drm_exec_objects_init(struct drm_exec_objects *container)
+{
+	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	container->max_objects = container->objects ?
+		PAGE_SIZE / sizeof(void *) : 0;
+	container->num_objects = 0;
+}
+
+/* Cleanup the drm_exec_objects container */
+static void drm_exec_objects_fini(struct drm_exec_objects *container)
+{
+	kvfree(container->objects);
+}
+
+/* Make sure we have enough room and add an object the container */
+static int drm_exec_objects_add(struct drm_exec_objects *container,
+				struct drm_gem_object *obj)
+{
+	if (unlikely(container->num_objects == container->max_objects)) {
+		size_t size = container->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		container->objects = tmp;
+		container->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	container->objects[container->num_objects++] = obj;
+	return 0;
+}
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_duplicate_object(exec, index, obj)
+		drm_gem_object_put(obj);
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked and duplicate
+ * objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	drm_exec_objects_init(&exec->locked);
+	drm_exec_objects_init(&exec->duplicates);
+	exec->contended = DRM_EXEC_DUMMY;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finilize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	drm_exec_objects_fini(&exec->locked);
+	drm_exec_objects_fini(&exec->duplicates);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->locked.num_objects = 0;
+	exec->duplicates.num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the xa and reserve fences */
+static int drm_exec_obj_locked(struct drm_exec_objects *container,
+			       struct drm_gem_object *obj,
+			       unsigned int num_fences)
+{
+	int ret;
+
+	if (container) {
+		ret = drm_exec_objects_add(container, obj);
+		if (ret)
+			return ret;
+	}
+
+	if (num_fences) {
+		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		if (ret)
+			goto error_erase;
+	}
+
+	return 0;
+
+error_erase:
+	if (container) {
+		--container->num_objects;
+		drm_gem_object_put(obj);
+	}
+	return ret;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
+	if (unlikely(ret))
+		dma_resv_unlock(obj->resv);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * succesfully locked objects are put into the locked container. Duplicates
+ * detected as well and automatically moved into the duplicates container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY)) {
+		struct drm_exec_objects *container = &exec->duplicates;
+
+		/*
+		 * If this is the first locked GEM object it was most likely
+		 * just contended. So don't add it to the duplicates, just
+		 * reserve the fence slots.
+		 */
+		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
+			container = NULL;
+
+		ret = drm_exec_obj_locked(container, obj, num_fences);
+		if (ret)
+			return ret;
+
+	} else if (unlikely(ret)) {
+		return ret;
+
+	} else {
+		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
+		if (ret)
+			goto error_unlock;
+	}
+
+	drm_gem_object_get(obj);
+	return 0;
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..f73981c6292e
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec_objects - Container for GEM objects in a drm_exec
+ */
+struct drm_exec_objects {
+	unsigned int		num_objects;
+	unsigned int		max_objects;
+	struct drm_gem_object	**objects;
+};
+
+/**
+ * drm_exec_objects_for_each - iterate over all the objects inside the container
+ */
+#define drm_exec_objects_for_each(array, index, obj)		\
+	for (index = 0, obj = (array)->objects[0];		\
+	     index < (array)->num_objects;			\
+	     ++index, obj = (array)->objects[index])
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @locked: container for the locked GEM objects
+	 */
+	struct drm_exec_objects	locked;
+
+	/**
+	 * @duplicates: container for the duplicated GEM objects
+	 */
+	struct drm_exec_objects	duplicates;
+
+	/**
+	 * @contended: contended GEM object we backet of for.
+	 */
+	struct drm_gem_object	*contended;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->locked, index, obj)
+
+/**
+ * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the duplicate GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+/**
+ * drm_exec_has_duplicates - check for duplicated GEM object
+ * @exec: drm_exec object
+ *
+ * Return true if the drm_exec object has encountered some already locked GEM
+ * objects while trying to lock them. This can happen if multiple GEM objects
+ * share the same underlying resv object.
+ */
+static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
+{
+	return exec->duplicates.num_objects > 0;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+
+#endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj()
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  8:51   ` Christian König
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Don't call drm_gem_object_get() unconditionally.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/drm_exec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
index ed2106c22786..5713a589a6a3 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -282,7 +282,6 @@ int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
 			goto error_unlock;
 	}
 
-	drm_gem_object_get(obj);
 	return 0;
 
 error_unlock:
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-19  4:14   ` Bagas Sanjaya
                     ` (4 more replies)
  2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
                   ` (11 subsequent siblings)
  14 siblings, 5 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This adds the infrastructure for a manager implementation to keep track
of GPU virtual address (VA) mappings.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide a dedicated range allocator to track GPU VA allocations and
   mappings, making use of the drm_mm range allocator.

2) Generically connect GPU VA mappings to their backing buffers, in
   particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
   operations on the GPU VA space. In particular splitting and merging
   of GPU VA mappings, e.g. for intersecting mapping requests or partial
   unmap requests.

Idea-suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/drm-mm.rst    |   31 +
 drivers/gpu/drm/Makefile        |    1 +
 drivers/gpu/drm/drm_gem.c       |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
 include/drm/drm_drv.h           |    6 +
 include/drm/drm_gem.h           |   75 ++
 include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
 7 files changed, 1966 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
 .. kernel-doc:: drivers/gpu/drm/drm_mm.c
    :export:
 
+DRM GPU VA Manager
+==================
+
+Overview
+--------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+-------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+
+DRM GPU VA Manager Function References
+--------------------------------------
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
 DRM Buddy Allocator
 ===================
 
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 4fe190aee584..de2ffca3b6e4 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
 	drm_vblank.o \
 	drm_vblank_work.o \
 	drm_vma_manager.o \
+	drm_gpuva_mgr.o \
 	drm_writeback.o
 drm-$(CONFIG_DRM_LEGACY) += \
 	drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 59a0bb5ebd85..65115fe88627 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 	if (!obj->resv)
 		obj->resv = &obj->_resv;
 
+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+		drm_gem_gpuva_init(obj);
+
 	drm_vma_node_reset(&obj->vma_node);
 	INIT_LIST_HEAD(&obj->lru_node);
 }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
new file mode 100644
index 000000000000..e665f642689d
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1323 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <drm/drm_gem.h>
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DOC: Overview
+ *
+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
+ * of a GPU's virtual address (VA) space and manages the corresponding virtual
+ * mappings represented by &drm_gpuva objects. It also keeps track of the
+ * mapping's backing &drm_gem_object buffers.
+ *
+ * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
+ * &drm_gpuva objects representing all existent GPU VA mappings using this
+ * &drm_gem_object as backing buffer.
+ *
+ * A GPU VA mapping can only be created within a previously allocated
+ * &drm_gpuva_region, which represents a reserved portion of the GPU VA space.
+ * GPU VA mappings are not allowed to span over a &drm_gpuva_region's boundary.
+ *
+ * GPU VA regions can also be flagged as sparse, which allows drivers to create
+ * sparse mappings for a whole GPU VA region in order to support Vulkan
+ * 'Sparse Resources'.
+ *
+ * The GPU VA manager internally uses the &drm_mm range allocator to manage the
+ * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
+ * space.
+ *
+ * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
+ * the &drm_gpuva_manager contains a special region representing the portion of
+ * VA space reserved by the kernel. This node is initialized together with the
+ * GPU VA manager instance and removed when the GPU VA manager is destroyed.
+ *
+ * In a typical application drivers would embed struct drm_gpuva_manager,
+ * struct drm_gpuva_region and struct drm_gpuva within their own driver
+ * specific structures, there won't be any memory allocations of it's own nor
+ * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
+ */
+
+/**
+ * DOC: Split and Merge
+ *
+ * The DRM GPU VA manager also provides an algorithm implementing splitting and
+ * merging of existent GPU VA mappings with the ones that are requested to be
+ * mapped or unmapped. This feature is required by the Vulkan API to implement
+ * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
+ * VM BIND.
+ *
+ * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list of map, unmap
+ * and remap operations for a given newly requested mapping. This list
+ * represents the set of operations to execute in order to integrate the new
+ * mapping cleanly into the current state of the GPU VA space.
+ *
+ * Depending on how the new GPU VA mapping intersects with the existent mappings
+ * of the GPU VA space the &drm_gpuva_ops contain an arbitrary amount of unmap
+ * operations, a maximum of two remap operations and a single map operation.
+ * The set of operations can also be empty if no operation is required, e.g. if
+ * the requested mapping already exists in the exact same way.
+ *
+ * The single map operation, if existent, represents the original map operation
+ * requested by the caller. Please note that this operation might be altered
+ * comparing it with the original map operation, e.g. because it was merged with
+ * an already  existent mapping. Hence, drivers must execute this map operation
+ * instead of the original one they passed to drm_gpuva_sm_map_ops_create().
+ *
+ * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
+ * &drm_gpuva to unmap is physically contiguous with the original mapping
+ * request. Optionally, if 'keep' is set, drivers may keep the actual page table
+ * entries for this &drm_gpuva, adding the missing page table entries only and
+ * update the &drm_gpuva_manager's view of things accordingly.
+ *
+ * Drivers may do the same optimization, namely delta page table updates, also
+ * for remap operations. This is possible since &drm_gpuva_op_remap consists of
+ * one unmap operation and one or two map operations, such that drivers can
+ * derive the page table update delta accordingly.
+ *
+ * Note that there can't be more than two existent mappings to split up, one at
+ * the beginning and one at the end of the new mapping, hence there is a
+ * maximum of two remap operations.
+ *
+ * Generally, the DRM GPU VA manager never merges mappings across the
+ * boundaries of &drm_gpuva_regions. This is the case since merging between
+ * GPU VA regions would result into unmap and map operations to be issued for
+ * both regions involved although the original mapping request was referred to
+ * one specific GPU VA region only. Since the other GPU VA region, the one not
+ * explicitly requested to be altered, might be in use by the GPU, we are not
+ * allowed to issue any map/unmap operations for this region.
+ *
+ * Note that before calling drm_gpuva_sm_map_ops_create() again with another
+ * mapping request it is necessary to update the &drm_gpuva_manager's view of
+ * the GPU VA space. The previously obtained operations must be either fully
+ * processed or completely abandoned.
+ *
+ * To update the &drm_gpuva_manager's view of the GPU VA space
+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
+ * drm_gpuva_destroy_unlocked() should be used.
+ *
+ * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
+ * provides drivers a the list of operations to be executed in order to unmap
+ * a range of GPU VA space. The logic behind this functions is way simpler
+ * though: For all existent mappings enclosed by the given range unmap
+ * operations are created. For mappings which are only partically located within
+ * the given range, remap operations are created such that those mappings are
+ * split up and re-mapped partically.
+ *
+ * The following paragraph depicts the basic constellations of existent GPU VA
+ * mappings, a newly requested mapping and the resulting mappings as implemented
+ * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
+ * of those constellations.
+ *
+ * ::
+ *
+ *	1) Existent mapping is kept.
+ *	----------------------------
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ *	2) Existent mapping is replaced.
+ *	--------------------------------
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     a     1
+ *	req: |-----------| (bo_offset=m)
+ *
+ *	     0     a     1
+ *	new: |-----------| (bo_offset=m)
+ *
+ *
+ *	3) Existent mapping is replaced.
+ *	--------------------------------
+ *
+ *	     0     a     1
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     b     1
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ *	4) Existent mapping is replaced.
+ *	--------------------------------
+ *
+ *	     0  a  1
+ *	old: |-----|       (bo_offset=n)
+ *
+ *	     0     a     2
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *	Note: We expect to see the same result for a request with a different bo
+ *	      and/or bo_offset.
+ *
+ *
+ *	5) Existent mapping is split.
+ *	-----------------------------
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  b  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0  b  1  a' 2
+ *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
+ *
+ *	Note: We expect to see the same result for a request with a different bo
+ *	      and/or non-contiguous bo_offset.
+ *
+ *
+ *	6) Existent mapping is kept.
+ *	----------------------------
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	     0  a  1
+ *	req: |-----|       (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ *	7) Existent mapping is split.
+ *	-----------------------------
+ *
+ *	     0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  b  2
+ *	req:       |-----| (bo_offset=m)
+ *
+ *	     0  a  1  b  2
+ *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ *	8) Existent mapping is kept.
+ *	----------------------------
+ *
+ *	      0     a     2
+ *	old: |-----------| (bo_offset=n)
+ *
+ *	           1  a  2
+ *	req:       |-----| (bo_offset=n+1)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *
+ *	9) Existent mapping is split.
+ *	-----------------------------
+ *
+ *	     0     a     2
+ *	old: |-----------|       (bo_offset=n)
+ *
+ *	           1     b     3
+ *	req:       |-----------| (bo_offset=m)
+ *
+ *	     0  a  1     b     3
+ *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
+ *
+ *
+ *	10) Existent mapping is merged.
+ *	-------------------------------
+ *
+ *	     0     a     2
+ *	old: |-----------|       (bo_offset=n)
+ *
+ *	           1     a     3
+ *	req:       |-----------| (bo_offset=n+1)
+ *
+ *	     0        a        3
+ *	new: |-----------------| (bo_offset=n)
+ *
+ *
+ *	11) Existent mapping is split.
+ *	------------------------------
+ *
+ *	     0        a        3
+ *	old: |-----------------| (bo_offset=n)
+ *
+ *	           1  b  2
+ *	req:       |-----|       (bo_offset=m)
+ *
+ *	     0  a  1  b  2  a' 3
+ *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
+ *
+ *
+ *	12) Existent mapping is kept.
+ *	-----------------------------
+ *
+ *	     0        a        3
+ *	old: |-----------------| (bo_offset=n)
+ *
+ *	           1  a  2
+ *	req:       |-----|       (bo_offset=n+1)
+ *
+ *	     0        a        3
+ *	old: |-----------------| (bo_offset=n)
+ *
+ *
+ *	13) Existent mapping is replaced.
+ *	---------------------------------
+ *
+ *	           1  a  2
+ *	old:       |-----| (bo_offset=n)
+ *
+ *	     0     a     2
+ *	req: |-----------| (bo_offset=n)
+ *
+ *	     0     a     2
+ *	new: |-----------| (bo_offset=n)
+ *
+ *	Note: We expect to see the same result for a request with a different bo
+ *	      and/or non-contiguous bo_offset.
+ *
+ *
+ *	14) Existent mapping is replaced.
+ *	---------------------------------
+ *
+ *	           1  a  2
+ *	old:       |-----| (bo_offset=n)
+ *
+ *	     0        a       3
+ *	req: |----------------| (bo_offset=n)
+ *
+ *	     0        a       3
+ *	new: |----------------| (bo_offset=n)
+ *
+ *	Note: We expect to see the same result for a request with a different bo
+ *	      and/or non-contiguous bo_offset.
+ *
+ *
+ *	15) Existent mapping is split.
+ *	------------------------------
+ *
+ *	           1     a     3
+ *	old:       |-----------| (bo_offset=n)
+ *
+ *	     0     b     2
+ *	req: |-----------|       (bo_offset=m)
+ *
+ *	     0     b     2  a' 3
+ *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
+ *
+ *
+ *	16) Existent mappings are merged.
+ *	---------------------------------
+ *
+ *	     0     a     1
+ *	old: |-----------|                        (bo_offset=n)
+ *
+ *	                            2     a     3
+ *	old':                       |-----------| (bo_offset=n+2)
+ *
+ *	                1     a     2
+ *	req:            |-----------|             (bo_offset=n+1)
+ *
+ *	                      a
+ *	new: |----------------------------------| (bo_offset=n)
+ */
+
+/**
+ * DOC: Locking
+ *
+ * Generally, the GPU VA manager does not take care of locking itself, it is
+ * the drivers responsibility to take care about locking. Drivers might want to
+ * protect the following operations: inserting, destroying and iterating
+ * &drm_gpuva and &drm_gpuva_region objects as well as generating split and merge
+ * operations.
+ *
+ * The GPU VA manager does take care of the locking of the backing
+ * &drm_gem_object buffers GPU VA lists though, unless the provided functions
+ * documentation claims otherwise.
+ */
+
+/**
+ * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to initialize
+ * @name: the name of the GPU VA space
+ * @start_offset: the start offset of the GPU VA space
+ * @range: the size of the GPU VA space
+ * @reserve_offset: the start of the kernel reserved GPU VA area
+ * @reserve_range: the size of the kernel reserved GPU VA area
+ *
+ * The &drm_gpuva_manager must be initialized with this function before use.
+ *
+ * Note that @mgr must be cleared to 0 before calling this function. The given
+ * &name is expected to be managed by the surrounding driver structures.
+ */
+void
+drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+		       const char *name,
+		       u64 start_offset, u64 range,
+		       u64 reserve_offset, u64 reserve_range)
+{
+	drm_mm_init(&mgr->va_mm, start_offset, range);
+	drm_mm_init(&mgr->region_mm, start_offset, range);
+
+	mgr->mm_start = start_offset;
+	mgr->mm_range = range;
+
+	mgr->name = name ? name : "unknown";
+
+	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
+	mgr->kernel_alloc_node.start = reserve_offset;
+	mgr->kernel_alloc_node.size = reserve_range;
+	drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_init);
+
+/**
+ * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
+ * @mgr: pointer to the &drm_gpuva_manager to clean up
+ *
+ * Note that it is a bug to call this function on a manager that still
+ * holds GPU VA mappings.
+ */
+void
+drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
+{
+	mgr->name = NULL;
+	drm_mm_remove_node(&mgr->kernel_alloc_node);
+	drm_mm_takedown(&mgr->va_mm);
+	drm_mm_takedown(&mgr->region_mm);
+}
+EXPORT_SYMBOL(drm_gpuva_manager_destroy);
+
+static struct drm_gpuva_region *
+drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+
+	/* Find the VA region the requested range is strictly enclosed by. */
+	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
+		if (reg->node.start <= addr &&
+		    reg->node.start + reg->node.size >= addr + range &&
+		    &reg->node != &mgr->kernel_alloc_node)
+			return reg;
+	}
+
+	return NULL;
+}
+
+static bool
+drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
+{
+	return !!drm_gpuva_in_region(mgr, addr, range);
+}
+
+/**
+ * drm_gpuva_insert - insert a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @va: the &drm_gpuva to insert
+ * @addr: the start address of the GPU VA
+ * @range: the range of the GPU VA
+ *
+ * Insert a &drm_gpuva with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * The function assumes the caller does not hold the &drm_gem_object's
+ * GPU VA list mutex.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		 struct drm_gpuva *va,
+		 u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+	int ret;
+
+	if (!va->gem.obj)
+		return -EINVAL;
+
+	reg = drm_gpuva_in_region(mgr, addr, range);
+	if (!reg)
+		return -EINVAL;
+
+	ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
+					  range, 0,
+					  0, addr,
+					  addr + range,
+					  DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
+	if (ret)
+		return ret;
+
+	va->mgr = mgr;
+	va->region = reg;
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_gpuva_insert);
+
+/**
+ * drm_gpuva_link_locked - link a &drm_gpuva
+ * @va: the &drm_gpuva to link
+ *
+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * The function assumes the caller already holds the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_link_locked(struct drm_gpuva *va)
+{
+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
+	list_add_tail(&va->head, &va->gem.obj->gpuva.list);
+}
+EXPORT_SYMBOL(drm_gpuva_link_locked);
+
+/**
+ * drm_gpuva_link_unlocked - unlink a &drm_gpuva
+ * @va: the &drm_gpuva to unlink
+ *
+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * The function assumes the caller does not hold the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_link_unlocked(struct drm_gpuva *va)
+{
+	drm_gem_gpuva_lock(va->gem.obj);
+	drm_gpuva_link_locked(va);
+	drm_gem_gpuva_unlock(va->gem.obj);
+}
+EXPORT_SYMBOL(drm_gpuva_link_unlocked);
+
+/**
+ * drm_gpuva_unlink_locked - unlink a &drm_gpuva
+ * @va: the &drm_gpuva to unlink
+ *
+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * The function assumes the caller already holds the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_unlink_locked(struct drm_gpuva *va)
+{
+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
+	list_del_init(&va->head);
+}
+EXPORT_SYMBOL(drm_gpuva_unlink_locked);
+
+/**
+ * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
+ * @va: the &drm_gpuva to unlink
+ *
+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
+ * associated with.
+ *
+ * The function assumes the caller does not hold the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
+{
+	drm_gem_gpuva_lock(va->gem.obj);
+	drm_gpuva_unlink_locked(va);
+	drm_gem_gpuva_unlock(va->gem.obj);
+}
+EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
+
+/**
+ * drm_gpuva_destroy_locked - destroy a &drm_gpuva
+ * @va: the &drm_gpuva to destroy
+ *
+ * This removes the given &va from GPU VA list of the &drm_gem_object it is
+ * associated with and removes it from the underlaying range allocator.
+ *
+ * The function assumes the caller already holds the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_destroy_locked(struct drm_gpuva *va)
+{
+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
+
+	list_del(&va->head);
+	drm_mm_remove_node(&va->node);
+}
+EXPORT_SYMBOL(drm_gpuva_destroy_locked);
+
+/**
+ * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
+ * @va: the &drm_gpuva to destroy
+ *
+ * This removes the given &va from GPU VA list of the &drm_gem_object it is
+ * associated with and removes it from the underlaying range allocator.
+ *
+ * The function assumes the caller does not hold the &drm_gem_object's
+ * GPU VA list mutex.
+ */
+void
+drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
+{
+	drm_gem_gpuva_lock(va->gem.obj);
+	list_del(&va->head);
+	drm_gem_gpuva_unlock(va->gem.obj);
+
+	drm_mm_remove_node(&va->node);
+}
+EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
+
+/**
+ * drm_gpuva_find - find a &drm_gpuva
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuvas address
+ * @range: the &drm_gpuvas range
+ *
+ * Returns: the &drm_gpuva at a given &addr and with a given &range
+ */
+struct drm_gpuva *
+drm_gpuva_find(struct drm_gpuva_manager *mgr,
+	       u64 addr, u64 range)
+{
+	struct drm_gpuva *va;
+
+	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
+		if (va->node.start == addr &&
+		    va->node.size == range)
+			return va;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL(drm_gpuva_find);
+
+/**
+ * drm_gpuva_find_prev - find the &drm_gpuva before the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @start: the given GPU VA's start address
+ *
+ * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
+{
+	struct drm_mm_node *node;
+
+	if (start <= mgr->mm_start ||
+	    start > (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
+	if (node == &mgr->va_mm.head_node)
+		return NULL;
+
+	return (struct drm_gpuva *)node;
+}
+EXPORT_SYMBOL(drm_gpuva_find_prev);
+
+/**
+ * drm_gpuva_find_next - find the &drm_gpuva after the given address
+ * @mgr: the &drm_gpuva_manager to search in
+ * @end: the given GPU VA's end address
+ *
+ * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
+ *
+ * Note that if there is any free space between the GPU VA mappings no mapping
+ * is returned.
+ *
+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
+ */
+struct drm_gpuva *
+drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
+{
+	struct drm_mm_node *node;
+
+	if (end < mgr->mm_start ||
+	    end >= (mgr->mm_start + mgr->mm_range))
+		return NULL;
+
+	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
+	if (node == &mgr->va_mm.head_node)
+		return NULL;
+
+	return (struct drm_gpuva *)node;
+}
+EXPORT_SYMBOL(drm_gpuva_find_next);
+
+/**
+ * drm_gpuva_region_insert - insert a &drm_gpuva_region
+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
+ * @reg: the &drm_gpuva_region to insert
+ * @addr: the start address of the GPU VA
+ * @range: the range of the GPU VA
+ *
+ * Insert a &drm_gpuva_region with a given address and range into a
+ * &drm_gpuva_manager.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+			struct drm_gpuva_region *reg,
+			u64 addr, u64 range)
+{
+	int ret;
+
+	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
+					  range, 0,
+					  0, addr,
+					  addr + range,
+					  DRM_MM_INSERT_LOW|
+					  DRM_MM_INSERT_ONCE);
+	if (ret)
+		return ret;
+
+	reg->mgr = mgr;
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_gpuva_region_insert);
+
+/**
+ * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
+ * @mgr: the &drm_gpuva_manager holding the region
+ * @reg: the &drm_gpuva to destroy
+ *
+ * This removes the given &reg from the underlaying range allocator.
+ */
+void
+drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
+			 struct drm_gpuva_region *reg)
+{
+	struct drm_gpuva *va;
+
+	drm_gpuva_for_each_va_in_range(va, mgr,
+				       reg->node.start,
+				       reg->node.size) {
+		WARN(1, "GPU VA region must be empty on destroy.\n");
+		return;
+	}
+
+	if (&reg->node == &mgr->kernel_alloc_node) {
+		WARN(1, "Can't destroy kernel reserved region.\n");
+		return;
+	}
+
+	drm_mm_remove_node(&reg->node);
+}
+EXPORT_SYMBOL(drm_gpuva_region_destroy);
+
+/**
+ * drm_gpuva_region_find - find a &drm_gpuva_region
+ * @mgr: the &drm_gpuva_manager to search in
+ * @addr: the &drm_gpuva_regions address
+ * @range: the &drm_gpuva_regions range
+ *
+ * Returns: the &drm_gpuva_region at a given &addr and with a given &range
+ */
+struct drm_gpuva_region *
+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
+		      u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+
+	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
+		if (reg->node.start == addr &&
+		    reg->node.size == range)
+			return reg;
+
+	return NULL;
+}
+EXPORT_SYMBOL(drm_gpuva_region_find);
+
+static int
+gpuva_op_map_new(struct drm_gpuva_op **pop,
+		 u64 addr, u64 range,
+		 struct drm_gem_object *obj, u64 offset)
+{
+	struct drm_gpuva_op *op;
+
+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	op->op = DRM_GPUVA_OP_MAP;
+	op->map.va.addr = addr;
+	op->map.va.range = range;
+	op->map.gem.obj = obj;
+	op->map.gem.offset = offset;
+
+	return 0;
+}
+
+static int
+gpuva_op_remap_new(struct drm_gpuva_op **pop,
+		   struct drm_gpuva_op_map *prev,
+		   struct drm_gpuva_op_map *next,
+		   struct drm_gpuva_op_unmap *unmap)
+{
+	struct drm_gpuva_op *op;
+	struct drm_gpuva_op_remap *r;
+
+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	op->op = DRM_GPUVA_OP_REMAP;
+	r = &op->remap;
+
+	if (prev) {
+		r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
+		if (!r->prev)
+			goto err_free_op;
+	}
+
+	if (next) {
+		r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
+		if (!r->next)
+			goto err_free_prev;
+	}
+
+	r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
+	if (!r->unmap)
+		goto err_free_next;
+
+	return 0;
+
+err_free_next:
+	if (next)
+		kfree(r->next);
+err_free_prev:
+	if (prev)
+		kfree(r->prev);
+err_free_op:
+	kfree(op);
+	*pop = NULL;
+
+	return -ENOMEM;
+}
+
+static int
+gpuva_op_unmap_new(struct drm_gpuva_op **pop,
+		   struct drm_gpuva *va, bool merge)
+{
+	struct drm_gpuva_op *op;
+
+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	op->op = DRM_GPUVA_OP_UNMAP;
+	op->unmap.va = va;
+	op->unmap.keep = merge;
+
+	return 0;
+}
+
+#define op_map_new_to_list(_ops, _addr, _range,		\
+			   _obj, _offset)		\
+do {							\
+	struct drm_gpuva_op *op;			\
+							\
+	ret = gpuva_op_map_new(&op, _addr, _range,	\
+			       _obj, _offset);		\
+	if (ret)					\
+		goto err_free_ops;			\
+							\
+	list_add_tail(&op->entry, _ops);		\
+} while (0)
+
+#define op_remap_new_to_list(_ops, _prev, _next,	\
+			     _unmap)			\
+do {							\
+	struct drm_gpuva_op *op;			\
+							\
+	ret = gpuva_op_remap_new(&op, _prev, _next,	\
+				 _unmap);		\
+	if (ret)					\
+		goto err_free_ops;			\
+							\
+	list_add_tail(&op->entry, _ops);		\
+} while (0)
+
+#define op_unmap_new_to_list(_ops, _gpuva, _merge)	\
+do {							\
+	struct drm_gpuva_op *op;			\
+							\
+	ret = gpuva_op_unmap_new(&op, _gpuva, _merge);	\
+	if (ret)					\
+		goto err_free_ops;			\
+							\
+	list_add_tail(&op->entry, _ops);		\
+} while (0)
+
+/**
+ * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the new mapping
+ * @req_range: the range of the new mapping
+ * @req_obj: the &drm_gem_object to map
+ * @req_offset: the offset within the &drm_gem_object
+ *
+ * This function creates a list of operations to perform splitting and merging
+ * of existent mapping(s) with the newly requested one.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain map, unmap and remap operations, but it
+ * also can be empty if no operation is required, e.g. if the requested mapping
+ * already exists is the exact same way.
+ *
+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
+ * operations and a single map operation. The latter one, if existent,
+ * represents the original map operation requested by the caller. Please note
+ * that the map operation might has been modified, e.g. if it was
+ * merged with an existent mapping.
+ *
+ * Note that before calling this function again with another mapping request it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
+ * The previously obtained operations must be either processed or abandoned.
+ * To update the &drm_gpuva_manager's view of the GPU VA space
+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
+ * drm_gpuva_destroy_unlocked() should be used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 req_addr, u64 req_range,
+			    struct drm_gem_object *req_obj, u64 req_offset)
+{
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva *va, *prev = NULL;
+	u64 req_end = req_addr + req_range;
+	bool skip_pmerge = false, skip_nmerge = false;
+	int ret;
+
+	if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
+		return ERR_PTR(-EINVAL);
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->node.start;
+		u64 range = va->node.size;
+		u64 end = addr + range;
+
+		/* Generally, we want to skip merging with potential mappings
+		 * left and right of the requested one when we found a
+		 * collision, since merging happens in this loop already.
+		 *
+		 * However, there is one exception when the requested mapping
+		 * spans into a free VM area. If this is the case we might
+		 * still hit the boundary of another mapping before and/or
+		 * after the free VM area.
+		 */
+		skip_pmerge = true;
+		skip_nmerge = true;
+
+		if (addr == req_addr) {
+			bool merge = obj == req_obj &&
+				     offset == req_offset;
+			if (end == req_end) {
+				if (merge)
+					goto done;
+
+				op_unmap_new_to_list(&ops->list, va, false);
+				break;
+			}
+
+			if (end < req_end) {
+				skip_nmerge = false;
+				op_unmap_new_to_list(&ops->list, va, merge);
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = range - req_range,
+					.gem.obj = obj,
+					.gem.offset = offset + req_range,
+				};
+				struct drm_gpuva_op_unmap u = { .va = va };
+
+				if (merge)
+					goto done;
+
+				op_remap_new_to_list(&ops->list, NULL, &n, &u);
+				break;
+			}
+		} else if (addr < req_addr) {
+			u64 ls_range = req_addr - addr;
+			struct drm_gpuva_op_map p = {
+				.va.addr = addr,
+				.va.range = ls_range,
+				.gem.obj = obj,
+				.gem.offset = offset,
+			};
+			struct drm_gpuva_op_unmap u = { .va = va };
+			bool merge = obj == req_obj &&
+				     offset + ls_range == req_offset;
+
+			if (end == req_end) {
+				if (merge)
+					goto done;
+
+				op_remap_new_to_list(&ops->list, &p, NULL, &u);
+				break;
+			}
+
+			if (end < req_end) {
+				u64 new_addr = addr;
+				u64 new_range = req_range + ls_range;
+				u64 new_offset = offset;
+
+				/* We validated that the requested mapping is
+				 * within a single VA region already.
+				 * Since it overlaps the current mapping (which
+				 * can't cross a VA region boundary) we can be
+				 * sure that we're still within the boundaries
+				 * of the same VA region after merging.
+				 */
+				if (merge) {
+					req_offset = new_offset;
+					req_addr = new_addr;
+					req_range = new_range;
+					op_unmap_new_to_list(&ops->list, va, true);
+					goto next;
+				}
+
+				op_remap_new_to_list(&ops->list, &p, NULL, &u);
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + ls_range +
+						      req_range,
+				};
+
+				if (merge)
+					goto done;
+
+				op_remap_new_to_list(&ops->list, &p, &n, &u);
+				break;
+			}
+		} else if (addr > req_addr) {
+			bool merge = obj == req_obj &&
+				     offset == req_offset +
+					       (addr - req_addr);
+			if (!prev)
+				skip_pmerge = false;
+
+			if (end == req_end) {
+				op_unmap_new_to_list(&ops->list, va, merge);
+				break;
+			}
+
+			if (end < req_end) {
+				skip_nmerge = false;
+				op_unmap_new_to_list(&ops->list, va, merge);
+				goto next;
+			}
+
+			if (end > req_end) {
+				struct drm_gpuva_op_map n = {
+					.va.addr = req_end,
+					.va.range = end - req_end,
+					.gem.obj = obj,
+					.gem.offset = offset + req_end - addr,
+				};
+				struct drm_gpuva_op_unmap u = { .va = va };
+				u64 new_end = end;
+				u64 new_range = new_end - req_addr;
+
+				/* We validated that the requested mapping is
+				 * within a single VA region already.
+				 * Since it overlaps the current mapping (which
+				 * can't cross a VA region boundary) we can be
+				 * sure that we're still within the boundaries
+				 * of the same VA region after merging.
+				 */
+				if (merge) {
+					req_end = new_end;
+					req_range = new_range;
+					op_unmap_new_to_list(&ops->list, va, true);
+					break;
+				}
+
+				op_remap_new_to_list(&ops->list, NULL, &n, &u);
+				break;
+			}
+		}
+next:
+		prev = va;
+	}
+
+	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
+	if (va) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->node.start;
+		u64 range = va->node.size;
+		u64 new_offset = offset;
+		u64 new_addr = addr;
+		u64 new_range = req_range + range;
+		bool merge = obj == req_obj &&
+			     offset + range == req_offset;
+
+		/* Don't merge over VA region boundaries. */
+		merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
+		if (merge) {
+			op_unmap_new_to_list(&ops->list, va, true);
+
+			req_offset = new_offset;
+			req_addr = new_addr;
+			req_range = new_range;
+		}
+	}
+
+	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
+	if (va) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->node.start;
+		u64 range = va->node.size;
+		u64 end = addr + range;
+		u64 new_range = req_range + range;
+		u64 new_end = end;
+		bool merge = obj == req_obj &&
+			     offset == req_offset + req_range;
+
+		/* Don't merge over VA region boundaries. */
+		merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
+		if (merge) {
+			op_unmap_new_to_list(&ops->list, va, true);
+
+			req_range = new_range;
+			req_end = new_end;
+		}
+	}
+
+	op_map_new_to_list(&ops->list,
+			   req_addr, req_range,
+			   req_obj, req_offset);
+
+done:
+	return ops;
+
+err_free_ops:
+	drm_gpuva_ops_free(ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
+
+#undef op_map_new_to_list
+#undef op_remap_new_to_list
+#undef op_unmap_new_to_list
+
+/**
+ * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ * @req_addr: the start address of the range to unmap
+ * @req_range: the range of the mappings to unmap
+ *
+ * This function creates a list of operations to perform unmapping and, if
+ * required, splitting of the mappings overlapping the unmap range.
+ *
+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
+ * in the given order. It can contain unmap and remap operations, depending on
+ * whether there are actual overlapping mappings to split.
+ *
+ * There can be an arbitrary amount of unmap operations and a maximum of two
+ * remap operations.
+ *
+ * Note that before calling this function again with another range to unmap it
+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
+ * The previously obtained operations must be processed or abandoned.
+ * To update the &drm_gpuva_manager's view of the GPU VA space
+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
+ * drm_gpuva_destroy_unlocked() should be used.
+ *
+ * After the caller finished processing the returned &drm_gpuva_ops, they must
+ * be freed with &drm_gpuva_ops_free.
+ *
+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
+ */
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 req_addr, u64 req_range)
+{
+	struct drm_gpuva_ops *ops;
+	struct drm_gpuva_op *op;
+	struct drm_gpuva_op_remap *r;
+	struct drm_gpuva *va;
+	u64 req_end = req_addr + req_range;
+	int ret;
+
+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ops->list);
+
+	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
+		struct drm_gem_object *obj = va->gem.obj;
+		u64 offset = va->gem.offset;
+		u64 addr = va->node.start;
+		u64 range = va->node.size;
+		u64 end = addr + range;
+
+		op = kzalloc(sizeof(*op), GFP_KERNEL);
+		if (!op) {
+			ret = -ENOMEM;
+			goto err_free_ops;
+		}
+
+		r = &op->remap;
+
+		if (addr < req_addr) {
+			r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
+			if (!r->prev) {
+				ret = -ENOMEM;
+				goto err_free_op;
+			}
+
+			r->prev->va.addr = addr;
+			r->prev->va.range = req_addr - addr;
+			r->prev->gem.obj = obj;
+			r->prev->gem.offset = offset;
+		}
+
+		if (end > req_end) {
+			r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
+			if (!r->next) {
+				ret = -ENOMEM;
+				goto err_free_prev;
+			}
+
+			r->next->va.addr = req_end;
+			r->next->va.range = end - req_end;
+			r->next->gem.obj = obj;
+			r->next->gem.offset = offset + (req_end - addr);
+		}
+
+		if (op->remap.prev || op->remap.next) {
+			op->op = DRM_GPUVA_OP_REMAP;
+			r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
+			if (!r->unmap) {
+				ret = -ENOMEM;
+				goto err_free_next;
+			}
+
+			r->unmap->va = va;
+		} else {
+			op->op = DRM_GPUVA_OP_UNMAP;
+			op->unmap.va = va;
+		}
+
+		list_add_tail(&op->entry, &ops->list);
+	}
+
+	return ops;
+
+err_free_next:
+	if (r->next)
+		kfree(r->next);
+err_free_prev:
+	if (r->prev)
+		kfree(r->prev);
+err_free_op:
+	kfree(op);
+err_free_ops:
+	drm_gpuva_ops_free(ops);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
+
+/**
+ * drm_gpuva_ops_free - free the given &drm_gpuva_ops
+ * @ops: the &drm_gpuva_ops to free
+ *
+ * Frees the given &drm_gpuva_ops structure including all the ops associated
+ * with it.
+ */
+void
+drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
+{
+	struct drm_gpuva_op *op, *next;
+
+	drm_gpuva_for_each_op_safe(op, next, ops) {
+		list_del(&op->entry);
+		if (op->op == DRM_GPUVA_OP_REMAP) {
+			if (op->remap.prev)
+				kfree(op->remap.prev);
+
+			if (op->remap.next)
+				kfree(op->remap.next);
+
+			kfree(op->remap.unmap);
+		}
+		kfree(op);
+	}
+
+	kfree(ops);
+}
+EXPORT_SYMBOL(drm_gpuva_ops_free);
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index d7c521e8860f..6feacd93aca6 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -104,6 +104,12 @@ enum drm_driver_feature {
 	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
 	 */
 	DRIVER_COMPUTE_ACCEL            = BIT(7),
+	/**
+	 * @DRIVER_GEM_GPUVA:
+	 *
+	 * Driver supports user defined GPU VA bindings for GEM objects.
+	 */
+	DRIVER_GEM_GPUVA		= BIT(8),
 
 	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 772a4adf5287..4a3679034966 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -36,6 +36,8 @@
 
 #include <linux/kref.h>
 #include <linux/dma-resv.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
 
 #include <drm/drm_vma_manager.h>
 
@@ -337,6 +339,17 @@ struct drm_gem_object {
 	 */
 	struct dma_resv _resv;
 
+	/**
+	 * @gpuva:
+	 *
+	 * Provides the list and list mutex of GPU VAs attached to this
+	 * GEM object.
+	 */
+	struct {
+		struct list_head list;
+		struct mutex mutex;
+	} gpuva;
+
 	/**
 	 * @funcs:
 	 *
@@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
 unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
 			       bool (*shrink)(struct drm_gem_object *obj));
 
+/**
+ * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
+ * @obj: the &drm_gem_object
+ *
+ * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
+ * protecting it.
+ *
+ * Calling this function is only necessary for drivers intending to support the
+ * &drm_driver_feature DRIVER_GEM_GPUVA.
+ */
+static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
+{
+	INIT_LIST_HEAD(&obj->gpuva.list);
+	mutex_init(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
+{
+	mutex_lock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
+ * @obj: the &drm_gem_object
+ *
+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
+ */
+static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
+{
+	mutex_unlock(&obj->gpuva.mutex);
+}
+
+/**
+ * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuva_manager.
+ */
+#define drm_gem_for_each_gpuva(entry, obj) \
+	list_for_each_entry(entry, &obj->gpuva.list, head)
+
+/**
+ * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @next: &next &drm_gpuva to store the next step
+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
+ * it is save against removal of elements.
+ */
+#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
+	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
+
 #endif /* __DRM_GEM_H__ */
diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
new file mode 100644
index 000000000000..adeb0c916e91
--- /dev/null
+++ b/include/drm/drm_gpuva_mgr.h
@@ -0,0 +1,527 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __DRM_GPUVA_MGR_H__
+#define __DRM_GPUVA_MGR_H__
+
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <drm/drm_mm.h>
+#include <linux/mm.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+struct drm_gpuva_region;
+struct drm_gpuva;
+struct drm_gpuva_ops;
+
+/**
+ * struct drm_gpuva_manager - DRM GPU VA Manager
+ *
+ * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
+ * the &drm_mm range allocator. Typically, this structure is embedded in bigger
+ * driver structures.
+ *
+ * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
+ * pages.
+ *
+ * There should be one manager instance per GPU virtual address space.
+ */
+struct drm_gpuva_manager {
+	/**
+	 * @name: the name of the DRM GPU VA space
+	 */
+	const char *name;
+
+	/**
+	 * @mm_start: start of the VA space
+	 */
+	u64 mm_start;
+
+	/**
+	 * @mm_range: length of the VA space
+	 */
+	u64 mm_range;
+
+	/**
+	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
+	 */
+	struct drm_mm region_mm;
+
+	/**
+	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
+	 */
+	struct drm_mm va_mm;
+
+	/**
+	 * @kernel_alloc_node:
+	 *
+	 * &drm_mm_node representing the address space cutout reserved for
+	 * the kernel
+	 */
+	struct drm_mm_node kernel_alloc_node;
+};
+
+void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+			    const char *name,
+			    u64 start_offset, u64 range,
+			    u64 reserve_offset, u64 reserve_range);
+void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
+
+/**
+ * struct drm_gpuva_region - structure to track a portion of GPU VA space
+ *
+ * This structure represents a portion of a GPUs VA space and is associated
+ * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
+ *
+ * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
+ * placed within a &drm_gpuva_region.
+ */
+struct drm_gpuva_region {
+	/**
+	 * @node: the &drm_mm_node to track the GPU VA region
+	 */
+	struct drm_mm_node node;
+
+	/**
+	 * @mgr: the &drm_gpuva_manager this object is associated with
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	/**
+	 * @sparse: indicates whether this region is sparse
+	 */
+	bool sparse;
+};
+
+struct drm_gpuva_region *
+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
+		      u64 addr, u64 range);
+int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
+			    struct drm_gpuva_region *reg,
+			    u64 addr, u64 range);
+void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
+			      struct drm_gpuva_region *reg);
+
+int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
+		     struct drm_gpuva *va,
+		     u64 addr, u64 range);
+/**
+ * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
+ * @node__: &drm_gpuva_region structure to assign to in each iteration step
+ * @gpuva__: &drm_gpuva_manager structure to walk
+ * @start__: starting offset, the first node will overlap this
+ * @end__: ending offset, the last node will start before this (but may overlap)
+ *
+ * This iterator walks over all nodes in the range allocator that lie
+ * between @start and @end. It is implemented similarly to list_for_each(),
+ * but is using &drm_mm's internal interval tree to accelerate the search for
+ * the starting node, and hence isn't safe against removal of elements. It
+ * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
+ * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
+ * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
+ * backing &drm_mm, and may even continue indefinitely.
+ */
+#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
+	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
+									 (start__), (end__)-1); \
+	     node__->node.start < (end__); \
+	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
+
+/**
+ * drm_gpuva_for_each_region - iternator to walk over a range of nodes
+ * @entry: &drm_gpuva_region structure to assign to in each iteration step
+ * @gpuva: &drm_gpuva_manager structure to walk
+ *
+ * This iterator walks over all &drm_gpuva_region structures associated with the
+ * &drm_gpuva_manager.
+ */
+#define drm_gpuva_for_each_region(entry, gpuva) \
+	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
+
+/**
+ * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
+ * nodes
+ * @entry: &drm_gpuva_region structure to assign to in each iteration step
+ * @next: &next &drm_gpuva_region to store the next step
+ * @gpuva: &drm_gpuva_manager structure to walk
+ *
+ * This iterator walks over all &drm_gpuva_region structures associated with the
+ * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
+ * against removal of elements.
+ */
+#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
+	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
+
+
+/**
+ * enum drm_gpuva_flags - flags for struct drm_gpuva
+ */
+enum drm_gpuva_flags {
+	/**
+	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
+	 */
+	DRM_GPUVA_SWAPPED = (1 << 0),
+};
+
+/**
+ * struct drm_gpuva - structure to track a GPU VA mapping
+ *
+ * This structure represents a GPU VA mapping and is associated with a
+ * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
+ *
+ * Typically, this structure is embedded in bigger driver structures.
+ */
+struct drm_gpuva {
+	/**
+	 * @node: the &drm_mm_node to track the GPU VA mapping
+	 */
+	struct drm_mm_node node;
+
+	/**
+	 * @mgr: the &drm_gpuva_manager this object is associated with
+	 */
+	struct drm_gpuva_manager *mgr;
+
+	/**
+	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
+	 */
+	struct drm_gpuva_region *region;
+
+	/**
+	 * @head: the &list_head to attach this object to a &drm_gem_object
+	 */
+	struct list_head head;
+
+	/**
+	 * @flags: the &drm_gpuva_flags for this mapping
+	 */
+	enum drm_gpuva_flags flags;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the mapped &drm_gem_object
+		 */
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+void drm_gpuva_link_locked(struct drm_gpuva *va);
+void drm_gpuva_link_unlocked(struct drm_gpuva *va);
+void drm_gpuva_unlink_locked(struct drm_gpuva *va);
+void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
+
+void drm_gpuva_destroy_locked(struct drm_gpuva *va);
+void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
+
+struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
+				 u64 addr, u64 range);
+struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
+struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
+
+/**
+ * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
+ * @va: the &drm_gpuva to set the swap flag of
+ * @swap: indicates whether the &drm_gpuva is swapped
+ */
+static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
+{
+	if (swap)
+		va->flags |= DRM_GPUVA_SWAPPED;
+	else
+		va->flags &= ~DRM_GPUVA_SWAPPED;
+}
+
+/**
+ * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
+ * is swapped
+ * @va: the &drm_gpuva to check
+ */
+static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
+{
+	return va->flags & DRM_GPUVA_SWAPPED;
+}
+
+/**
+ * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
+ * @node__: &drm_gpuva structure to assign to in each iteration step
+ * @gpuva__: &drm_gpuva_manager structure to walk
+ * @start__: starting offset, the first node will overlap this
+ * @end__: ending offset, the last node will start before this (but may overlap)
+ *
+ * This iterator walks over all nodes in the range allocator that lie
+ * between @start and @end. It is implemented similarly to list_for_each(),
+ * but is using &drm_mm's internal interval tree to accelerate the search for
+ * the starting node, and hence isn't safe against removal of elements. It
+ * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
+ * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
+ * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
+ * backing &drm_mm, and may even continue indefinitely.
+ */
+#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
+	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
+								  (start__), (end__)-1); \
+	     node__->node.start < (end__); \
+	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
+
+/**
+ * drm_gpuva_for_each_va - iternator to walk over a range of nodes
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @gpuva: &drm_gpuva_manager structure to walk
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuva_manager.
+ */
+#define drm_gpuva_for_each_va(entry, gpuva) \
+	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
+
+/**
+ * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
+ * nodes
+ * @entry: &drm_gpuva structure to assign to in each iteration step
+ * @next: &next &drm_gpuva to store the next step
+ * @gpuva: &drm_gpuva_manager structure to walk
+ *
+ * This iterator walks over all &drm_gpuva structures associated with the
+ * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
+ * against removal of elements.
+ */
+#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
+	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
+
+/**
+ * enum drm_gpuva_op_type - GPU VA operation type
+ *
+ * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
+ * can be map, remap or unmap operations.
+ */
+enum drm_gpuva_op_type {
+	/**
+	 * @DRM_GPUVA_OP_MAP: the map op type
+	 */
+	DRM_GPUVA_OP_MAP,
+
+	/**
+	 * @DRM_GPUVA_OP_REMAP: the remap op type
+	 */
+	DRM_GPUVA_OP_REMAP,
+
+	/**
+	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
+	 */
+	DRM_GPUVA_OP_UNMAP,
+};
+
+/**
+ * struct drm_gpuva_op_map - GPU VA map operation
+ *
+ * This structure represents a single map operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_map {
+	/**
+	 * @va: structure containing address and range of a map
+	 * operation
+	 */
+	struct {
+		/**
+		 * @addr: the base address of the new mapping
+		 */
+		u64 addr;
+
+		/**
+		 * @range: the range of the new mapping
+		 */
+		u64 range;
+	} va;
+
+	/**
+	 * @gem: structure containing the &drm_gem_object and it's offset
+	 */
+	struct {
+		/**
+		 * @offset: the offset within the &drm_gem_object
+		 */
+		u64 offset;
+
+		/**
+		 * @obj: the &drm_gem_object to map
+		 */
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+/**
+ * struct drm_gpuva_op_unmap - GPU VA unmap operation
+ *
+ * This structure represents a single unmap operation generated by the
+ * DRM GPU VA manager.
+ */
+struct drm_gpuva_op_unmap {
+	/**
+	 * @va: the &drm_gpuva to unmap
+	 */
+	struct drm_gpuva *va;
+
+	/**
+	 * @keep:
+	 *
+	 * Indicates whether this &drm_gpuva is physically contiguous with the
+	 * original mapping request.
+	 *
+	 * Optionally, if &keep is set, drivers may keep the actual page table
+	 * mappings for this &drm_gpuva, adding the missing page table entries
+	 * only and update the &drm_gpuva_manager accordingly.
+	 */
+	bool keep;
+};
+
+/**
+ * struct drm_gpuva_op_remap - GPU VA remap operation
+ *
+ * This represents a single remap operation generated by the DRM GPU VA manager.
+ *
+ * A remap operation is generated when an existing GPU VA mmapping is split up
+ * by inserting a new GPU VA mapping or by partially unmapping existent
+ * mapping(s), hence it consists of a maximum of two map and one unmap
+ * operation.
+ *
+ * The @unmap operation takes care of removing the original existing mapping.
+ * @prev is used to remap the preceding part, @next the subsequent part.
+ *
+ * If either a new mapping's start address is aligned with the start address
+ * of the old mapping or the new mapping's end address is aligned with the
+ * end address of the old mapping, either @prev or @next is NULL.
+ *
+ * Note, the reason for a dedicated remap operation, rather than arbitrary
+ * unmap and map operations, is to give drivers the chance of extracting driver
+ * specific data for creating the new mappings from the unmap operations's
+ * &drm_gpuva structure which typically is embedded in larger driver specific
+ * structures.
+ */
+struct drm_gpuva_op_remap {
+	/**
+	 * @prev: the preceding part of a split mapping
+	 */
+	struct drm_gpuva_op_map *prev;
+
+	/**
+	 * @next: the subsequent part of a split mapping
+	 */
+	struct drm_gpuva_op_map *next;
+
+	/**
+	 * @unmap: the unmap operation for the original existing mapping
+	 */
+	struct drm_gpuva_op_unmap *unmap;
+};
+
+/**
+ * struct drm_gpuva_op - GPU VA operation
+ *
+ * This structure represents a single generic operation, which can be either
+ * map, unmap or remap.
+ *
+ * The particular type of the operation is defined by @op.
+ */
+struct drm_gpuva_op {
+	/**
+	 * @entry:
+	 *
+	 * The &list_head used to distribute instances of this struct within
+	 * &drm_gpuva_ops.
+	 */
+	struct list_head entry;
+
+	/**
+	 * @op: the type of the operation
+	 */
+	enum drm_gpuva_op_type op;
+
+	union {
+		/**
+		 * @map: the map operation
+		 */
+		struct drm_gpuva_op_map map;
+
+		/**
+		 * @unmap: the unmap operation
+		 */
+		struct drm_gpuva_op_unmap unmap;
+
+		/**
+		 * @remap: the remap operation
+		 */
+		struct drm_gpuva_op_remap remap;
+	};
+};
+
+/**
+ * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
+ */
+struct drm_gpuva_ops {
+	/**
+	 * @list: the &list_head
+	 */
+	struct list_head list;
+};
+
+/**
+ * drm_gpuva_for_each_op - iterator to walk over all ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations.
+ */
+#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
+
+/**
+ * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
+ * @op: &drm_gpuva_op to assign in each iteration step
+ * @next: &next &drm_gpuva_op to store the next step
+ * @ops: &drm_gpuva_ops to walk
+ *
+ * This iterator walks over all ops within a given list of operations. It is
+ * implemented with list_for_each_safe(), so save against removal of elements.
+ */
+#define drm_gpuva_for_each_op_safe(op, next, ops) \
+	list_for_each_entry_safe(op, next, &(ops)->list, entry)
+
+struct drm_gpuva_ops *
+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
+			    u64 addr, u64 range,
+			    struct drm_gem_object *obj, u64 offset);
+struct drm_gpuva_ops *
+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
+			      u64 addr, u64 range);
+void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
+
+#endif /* __DRM_GPUVA_MGR_H__ */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (2 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18 13:55   ` kernel test robot
  2023-01-18 15:47   ` kernel test robot
  2023-01-18  6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This commit adds a function to dump a DRM GPU VA space and a macro for
drivers to register the struct drm_info_list 'gpuvas' entry.

Most likely, most drivers might maintain one DRM GPU VA space per struct
drm_file, but there might also be drivers not having a fixed relation
between DRM GPU VA spaces and a DRM core infrastructure, hence we need the
indirection via the driver iterating it's maintained DRM GPU VA spaces.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/drm_debugfs.c | 56 +++++++++++++++++++++++++++++++++++
 include/drm/drm_debugfs.h     | 25 ++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 4f643a490dc3..5389dd73c0fb 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -39,6 +39,7 @@
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_managed.h>
+#include <drm/drm_gpuva_mgr.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -175,6 +176,61 @@ static const struct file_operations drm_debugfs_fops = {
 	.release = single_release,
 };
 
+/**
+ * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
+ * @m: pointer to the &seq_file to write
+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
+ *
+ * Dumps the GPU VA regions and mappings of a given DRM GPU VA manager.
+ *
+ * For each DRM GPU VA space drivers should call this function from their
+ * &drm_info_list's show callback.
+ *
+ * Returns: 0 on success, -ENODEV if the &mgr is not initialized
+ */
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr)
+{
+	struct drm_gpuva_region *reg;
+	struct drm_gpuva *va;
+
+	if (!mgr->name)
+		return -ENODEV;
+
+	seq_printf(m, "DRM GPU VA space (%s)\n", mgr->name);
+	seq_puts  (m, "\n");
+	seq_puts  (m, " VA regions  | start              | range              | end                | sparse\n");
+	seq_puts  (m, "------------------------------------------------------------------------------------\n");
+	seq_printf(m, " VA space    | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
+		   mgr->mm_start, mgr->mm_range, mgr->mm_start + mgr->mm_range);
+	seq_puts  (m, "-----------------------------------------------------------------------------------\n");
+	drm_gpuva_for_each_region(reg, mgr) {
+		struct drm_mm_node *node = &reg->node;
+
+		if (node == &mgr->kernel_alloc_node) {
+			seq_printf(m, " kernel node | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
+				   node->start, node->size, node->start + node->size);
+			continue;
+		}
+
+		seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx | %s\n",
+			   node->start, node->size, node->start + node->size,
+			   reg->sparse ? "true" : "false");
+	}
+	seq_puts(m, "\n");
+	seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
+	seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
+	drm_gpuva_for_each_va(va, mgr) {
+		struct drm_mm_node *node = &va->node;
+
+		seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
+			   node->start, node->size, node->start + node->size,
+			   (u64)va->gem.obj, va->gem.offset);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_debugfs_gpuva_info);
 
 /**
  * drm_debugfs_create_files - Initialize a given set of debugfs files for DRM
diff --git a/include/drm/drm_debugfs.h b/include/drm/drm_debugfs.h
index 7616f457ce70..cb2c1956a214 100644
--- a/include/drm/drm_debugfs.h
+++ b/include/drm/drm_debugfs.h
@@ -34,6 +34,22 @@
 
 #include <linux/types.h>
 #include <linux/seq_file.h>
+
+#include <drm/drm_gpuva_mgr.h>
+
+/**
+ * DRM_DEBUGFS_GPUVA_INFO - &drm_info_list entry to dump a GPU VA space
+ * @show: the &drm_info_list's show callback
+ * @data: driver private data
+ *
+ * Drivers should use this macro to define a &drm_info_list entry to provide a
+ * debugfs file for dumping the GPU VA space regions and mappings.
+ *
+ * For each DRM GPU VA space drivers should call drm_debugfs_gpuva_info() from
+ * their @show callback.
+ */
+#define DRM_DEBUGFS_GPUVA_INFO(show, data) {"gpuvas", show, DRIVER_GEM_GPUVA, data}
+
 /**
  * struct drm_info_list - debugfs info list entry
  *
@@ -134,6 +150,9 @@ void drm_debugfs_add_file(struct drm_device *dev, const char *name,
 
 void drm_debugfs_add_files(struct drm_device *dev,
 			   const struct drm_debugfs_info *files, int count);
+
+int drm_debugfs_gpuva_info(struct seq_file *m,
+			   struct drm_gpuva_manager *mgr);
 #else
 static inline void drm_debugfs_create_files(const struct drm_info_list *files,
 					    int count, struct dentry *root,
@@ -155,6 +174,12 @@ static inline void drm_debugfs_add_files(struct drm_device *dev,
 					 const struct drm_debugfs_info *files,
 					 int count)
 {}
+
+static inline int drm_debugfs_gpuva_info(struct seq_file *m,
+					 struct drm_gpuva_manager *mgr)
+{
+	return 0;
+}
 #endif
 
 #endif /* _DRM_DEBUGFS_H_ */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (3 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-27  1:05   ` Matthew Brost
  2023-01-18  6:12 ` [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This commit provides the interfaces for the new UAPI motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
   DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
   VA area.

2) Bind and unbind GPU VA space mappings via the new
   DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
asynchronous processing with DRM syncobjs as synchronization mechanism.

The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

Co-authored-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/driver-uapi.rst |   8 ++
 include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+)

diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
index 4411e6919a3d..9c7ca6e33a68 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -6,3 +6,11 @@ drm/i915 uAPI
 =============
 
 .. kernel-doc:: include/uapi/drm/i915_drm.h
+
+drm/nouveau uAPI
+================
+
+VM_BIND / EXEC uAPI
+-------------------
+
+.. kernel-doc:: include/uapi/drm/nouveau_drm.h
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 853a327433d3..f6e7d40201d4 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
 	__u32 handle;
 };
 
+/**
+ * struct drm_nouveau_sync - sync object
+ *
+ * This structure serves as synchronization mechanism for (potentially)
+ * asynchronous operations such as EXEC or VM_BIND.
+ */
+struct drm_nouveau_sync {
+	/**
+	 * @flags: the flags for a sync object
+	 *
+	 * The first 8 bits are used to determine the type of the sync object.
+	 */
+	__u32 flags;
+#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
+#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
+#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
+	/**
+	 * @handle: the handle of the sync object
+	 */
+	__u32 handle;
+	/**
+	 * @timeline_value:
+	 *
+	 * The timeline point of the sync object in case the syncobj is of
+	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
+	 */
+	__u64 timeline_value;
+};
+
+/**
+ * struct drm_nouveau_vm_init - GPU VA space init structure
+ *
+ * Used to initialize the GPU's VA space for a user client, telling the kernel
+ * which portion of the VA space is managed by the UMD and kernel respectively.
+ */
+struct drm_nouveau_vm_init {
+	/**
+	 * @unmanaged_addr: start address of the kernel managed VA space region
+	 */
+	__u64 unmanaged_addr;
+	/**
+	 * @unmanaged_size: size of the kernel managed VA space region in bytes
+	 */
+	__u64 unmanaged_size;
+};
+
+/**
+ * struct drm_nouveau_vm_bind_op - VM_BIND operation
+ *
+ * This structure represents a single VM_BIND operation. UMDs should pass
+ * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
+ */
+struct drm_nouveau_vm_bind_op {
+	/**
+	 * @op: the operation type
+	 */
+	__u32 op;
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
+ *
+ * The alloc operation is used to reserve a VA space region within the GPU's VA
+ * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
+ * instruct the kernel to create sparse mappings for the given region.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_MAP:
+ *
+ * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
+ * a previously allocated VA space region. If the region is sparse, existing
+ * sparse mappings are overwritten.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
+/**
+ * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
+ *
+ * Unmap an existing mapping in the GPU's VA space. If the region the mapping
+ * is located in is a sparse region, new sparse mappings are created where the
+ * unmapped (memory backed) mapping was mapped previously.
+ */
+#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind_op
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_SPARSE:
+ *
+ * Indicates that an allocated VA space region should be sparse.
+ */
+#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
+	/**
+	 * @handle: the handle of the DRM GEM object to map
+	 */
+	__u32 handle;
+	/**
+	 * @addr:
+	 *
+	 * the address the VA space region or (memory backed) mapping should be mapped to
+	 */
+	__u64 addr;
+	/**
+	 * @bo_offset: the offset within the BO backing the mapping
+	 */
+	__u64 bo_offset;
+	/**
+	 * @range: the size of the requested mapping in bytes
+	 */
+	__u64 range;
+};
+
+/**
+ * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
+ */
+struct drm_nouveau_vm_bind {
+	/**
+	 * @op_count: the number of &drm_nouveau_vm_bind_op
+	 */
+	__u32 op_count;
+	/**
+	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
+	 */
+	__u32 flags;
+/**
+ * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
+ *
+ * Indicates that the given VM_BIND operation should be executed asynchronously
+ * by the kernel.
+ *
+ * If this flag is not supplied the kernel executes the associated operations
+ * synchronously and doesn't accept any &drm_nouveau_sync objects.
+ */
+#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
+	 */
+	__u64 op_ptr;
+};
+
+/**
+ * struct drm_nouveau_exec_push - EXEC push operation
+ *
+ * This structure represents a single EXEC push operation. UMDs should pass an
+ * array of this structure via struct drm_nouveau_exec's &push_ptr field.
+ */
+struct drm_nouveau_exec_push {
+	/**
+	 * @va: the virtual address of the push buffer mapping
+	 */
+	__u64 va;
+	/**
+	 * @va_len: the length of the push buffer mapping
+	 */
+	__u64 va_len;
+};
+
+/**
+ * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
+ */
+struct drm_nouveau_exec {
+	/**
+	 * @channel: the channel to execute the push buffer in
+	 */
+	__u32 channel;
+	/**
+	 * @push_count: the number of &drm_nouveau_exec_push ops
+	 */
+	__u32 push_count;
+	/**
+	 * @wait_count: the number of wait &drm_nouveau_syncs
+	 */
+	__u32 wait_count;
+	/**
+	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
+	 */
+	__u32 sig_count;
+	/**
+	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
+	 */
+	__u64 wait_ptr;
+	/**
+	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
+	 */
+	__u64 sig_ptr;
+	/**
+	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
+	 */
+	__u64 push_ptr;
+};
+
 #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
 #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
 #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
@@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
 #define DRM_NOUVEAU_NVIF               0x07
 #define DRM_NOUVEAU_SVM_INIT           0x08
 #define DRM_NOUVEAU_SVM_BIND           0x09
+#define DRM_NOUVEAU_VM_INIT            0x10
+#define DRM_NOUVEAU_VM_BIND            0x11
+#define DRM_NOUVEAU_EXEC               0x12
 #define DRM_NOUVEAU_GEM_NEW            0x40
 #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
 #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
@@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
 #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
 #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
 
+#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
+#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
+#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
 #if defined(__cplusplus)
 }
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm()
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (4 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Provide a getter function for the client's current vmm context. Since
we'll add a new (u)vmm context for UMD bindings in subsequent commits,
this will keep the code clean.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h  | 9 +++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c  | 6 +++---
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 335fa91ca4ad..d2b32a47e480 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -204,7 +204,7 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	struct nouveau_drm *drm = cli->drm;
 	struct nouveau_bo *nvbo;
 	struct nvif_mmu *mmu = &cli->mmu;
-	struct nvif_vmm *vmm = cli->svm.cli ? &cli->svm.vmm : &cli->vmm.vmm;
+	struct nvif_vmm *vmm = &nouveau_cli_vmm(cli)->vmm;
 	int i, pi = -1;
 
 	if (!*size) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index e648ecd0c1a0..1068abe41024 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -148,7 +148,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device,
 
 	chan->device = device;
 	chan->drm = drm;
-	chan->vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	chan->vmm = nouveau_cli_vmm(cli);
 	atomic_set(&chan->killed, 0);
 
 	/* allocate memory for dma push buffer */
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index b5de312a523f..81350e685b50 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -112,6 +112,15 @@ struct nouveau_cli_work {
 	struct dma_fence_cb cb;
 };
 
+static inline struct nouveau_vmm *
+nouveau_cli_vmm(struct nouveau_cli *cli)
+{
+	if (cli->svm.cli)
+		return &cli->svm;
+
+	return &cli->vmm;
+}
+
 void nouveau_cli_work_queue(struct nouveau_cli *, struct dma_fence *,
 			    struct nouveau_cli_work *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index ac5793c96957..48e6ba00ec27 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -103,7 +103,7 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -180,7 +180,7 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : & cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
 
@@ -269,7 +269,7 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
 {
 	struct nouveau_cli *cli = nouveau_cli(file_priv);
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
-	struct nouveau_vmm *vmm = cli->svm.cli ? &cli->svm : &cli->vmm;
+	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 
 	if (is_power_of_2(nvbo->valid_domains))
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (5 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Initialize the GEM's DRM GPU VA manager interface in preparation for the
(u)vmm implementation, provided by subsequent commits, to make use of it.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index d2b32a47e480..4cdeda7fe2df 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -215,11 +215,14 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 	nvbo = kzalloc(sizeof(struct nouveau_bo), GFP_KERNEL);
 	if (!nvbo)
 		return ERR_PTR(-ENOMEM);
+
 	INIT_LIST_HEAD(&nvbo->head);
 	INIT_LIST_HEAD(&nvbo->entry);
 	INIT_LIST_HEAD(&nvbo->vma_list);
 	nvbo->bo.bdev = &drm->ttm.bdev;
 
+	drm_gem_gpuva_init(&nvbo->bo.base);
+
 	/* This is confusing, and doesn't actually mean we want an uncached
 	 * mapping, but is what NOUVEAU_GEM_DOMAIN_COHERENT gets translated
 	 * into in nouveau_gem_new().
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (6 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Move the usercopy helpers to a common driver header file to make it
usable for the new API added in subsequent commits.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_drv.h | 26 ++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c | 26 --------------------------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 81350e685b50..20a7f31b9082 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -130,6 +130,32 @@ nouveau_cli(struct drm_file *fpriv)
 	return fpriv ? fpriv->driver_priv : NULL;
 }
 
+static inline void
+u_free(void *addr)
+{
+	kvfree(addr);
+}
+
+static inline void *
+u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
+{
+	void *mem;
+	void __user *userptr = (void __force __user *)(uintptr_t)user;
+
+	size *= nmemb;
+
+	mem = kvmalloc(size, GFP_KERNEL);
+	if (!mem)
+		return ERR_PTR(-ENOMEM);
+
+	if (copy_from_user(mem, userptr, size)) {
+		u_free(mem);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return mem;
+}
+
 #include <nvif/object.h>
 #include <nvif/parent.h>
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 48e6ba00ec27..5dad2d0dd5cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -613,32 +613,6 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	return 0;
 }
 
-static inline void
-u_free(void *addr)
-{
-	kvfree(addr);
-}
-
-static inline void *
-u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
-{
-	void *mem;
-	void __user *userptr = (void __force __user *)(uintptr_t)user;
-
-	size *= nmemb;
-
-	mem = kvmalloc(size, GFP_KERNEL);
-	if (!mem)
-		return ERR_PTR(-ENOMEM);
-
-	if (copy_from_user(mem, userptr, size)) {
-		u_free(mem);
-		return ERR_PTR(-EFAULT);
-	}
-
-	return mem;
-}
-
 static int
 nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 				struct drm_nouveau_gem_pushbuf *req,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (7 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ee5e9d40c166..62c70d9a32e6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -226,6 +227,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 0ca2bc85adf6..00a08699bb58 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -45,7 +45,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill()
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (8 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting fences.

If a job times out, we need a way to recover from this situation. For
now, simply kill the channel to unblock all hung up jobs and signal
userspace that the device is dead on the next EXEC or VM_BIND ioctl.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_chan.c | 14 +++++++++++---
 drivers/gpu/drm/nouveau/nouveau_chan.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 1068abe41024..6f47e997d9cf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -40,6 +40,14 @@ MODULE_PARM_DESC(vram_pushbuf, "Create DMA push buffers in VRAM");
 int nouveau_vram_pushbuf;
 module_param_named(vram_pushbuf, nouveau_vram_pushbuf, int, 0400);
 
+void
+nouveau_channel_kill(struct nouveau_channel *chan)
+{
+	atomic_set(&chan->killed, 1);
+	if (chan->fence)
+		nouveau_fence_context_kill(chan->fence, -ENODEV);
+}
+
 static int
 nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 {
@@ -47,9 +55,9 @@ nouveau_channel_killed(struct nvif_event *event, void *repv, u32 repc)
 	struct nouveau_cli *cli = (void *)chan->user.client;
 
 	NV_PRINTK(warn, cli, "channel %d killed!\n", chan->chid);
-	atomic_set(&chan->killed, 1);
-	if (chan->fence)
-		nouveau_fence_context_kill(chan->fence, -ENODEV);
+
+	if (unlikely(!atomic_read(&chan->killed)))
+		nouveau_channel_kill(chan);
 
 	return NVIF_EVENT_DROP;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.h b/drivers/gpu/drm/nouveau/nouveau_chan.h
index e06a8ffed31a..e483f4a254da 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.h
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.h
@@ -65,6 +65,7 @@ int  nouveau_channel_new(struct nouveau_drm *, struct nvif_device *, bool priv,
 			 u32 vram, u32 gart, struct nouveau_channel **);
 void nouveau_channel_del(struct nouveau_channel **);
 int  nouveau_channel_idle(struct nouveau_channel *);
+void nouveau_channel_kill(struct nouveau_channel *);
 
 extern int nouveau_vram_pushbuf;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (9 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-20  3:37   ` kernel test robot
  2023-01-18  6:12 ` [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

The new VM_BIND UAPI uses the DRM GPU VA manager to manage the VA space.
Hence, we a need a way to manipulate the MMUs page tables without going
through the internal range allocator implemented by nvkm/vmm.

This patch adds a raw interface for nvkm/vmm to pass the resposibility
for managing the address space and the corresponding map/unmap/sparse
operations to the upper layers.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |  23 ++-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |  17 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |  10 ++
 drivers/gpu/drm/nouveau/nouveau_svm.c         |   2 +-
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |   4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |  73 +++++++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    | 168 +++++++++++++++++-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |   1 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  32 +++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   3 +
 10 files changed, 319 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index 9c7ff56831c5..d30e32fb8628 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -3,7 +3,10 @@
 struct nvif_vmm_v0 {
 	__u8  version;
 	__u8  page_nr;
-	__u8  managed;
+#define NVIF_VMM_V0_TYPE_UNMANAGED                                         0x00
+#define NVIF_VMM_V0_TYPE_MANAGED                                           0x01
+#define NVIF_VMM_V0_TYPE_RAW                                               0x02
+	__u8  type;
 	__u8  pad03[5];
 	__u64 addr;
 	__u64 size;
@@ -17,6 +20,7 @@ struct nvif_vmm_v0 {
 #define NVIF_VMM_V0_UNMAP                                                  0x04
 #define NVIF_VMM_V0_PFNMAP                                                 0x05
 #define NVIF_VMM_V0_PFNCLR                                                 0x06
+#define NVIF_VMM_V0_RAW                                                    0x07
 #define NVIF_VMM_V0_MTHD(i)                                         ((i) + 0x80)
 
 struct nvif_vmm_page_v0 {
@@ -66,6 +70,23 @@ struct nvif_vmm_unmap_v0 {
 	__u64 addr;
 };
 
+struct nvif_vmm_raw_v0 {
+	__u8 version;
+#define NVIF_VMM_RAW_V0_MAP	0x0
+#define NVIF_VMM_RAW_V0_UNMAP	0x1
+#define NVIF_VMM_RAW_V0_SPARSE	0x2
+	__u8  op;
+	__u8  sparse;
+	__u8  ref;
+	__u8  pad04[4];
+	__u64 addr;
+	__u64 size;
+	__u64 offset;
+	__u64 memory;
+	__u64 handle;
+	__u8  data[];
+};
+
 struct nvif_vmm_pfnmap_v0 {
 	__u8  version;
 	__u8  page;
diff --git a/drivers/gpu/drm/nouveau/include/nvif/vmm.h b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
index a2ee92201ace..4d0781740336 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/vmm.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/vmm.h
@@ -4,6 +4,12 @@
 struct nvif_mem;
 struct nvif_mmu;
 
+enum nvif_vmm_type {
+	UNMANAGED,
+	MANAGED,
+	RAW,
+};
+
 enum nvif_vmm_get {
 	ADDR,
 	PTES,
@@ -30,8 +36,9 @@ struct nvif_vmm {
 	int page_nr;
 };
 
-int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass, bool managed,
-		  u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *);
+int nvif_vmm_ctor(struct nvif_mmu *, const char *name, s32 oclass,
+		  enum nvif_vmm_type, u64 addr, u64 size, void *argv, u32 argc,
+		  struct nvif_vmm *);
 void nvif_vmm_dtor(struct nvif_vmm *);
 int nvif_vmm_get(struct nvif_vmm *, enum nvif_vmm_get, bool sparse,
 		 u8 page, u8 align, u64 size, struct nvif_vma *);
@@ -39,4 +46,10 @@ void nvif_vmm_put(struct nvif_vmm *, struct nvif_vma *);
 int nvif_vmm_map(struct nvif_vmm *, u64 addr, u64 size, void *argv, u32 argc,
 		 struct nvif_mem *, u64 offset);
 int nvif_vmm_unmap(struct nvif_vmm *, u64);
+int nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 handle, bool sparse);
+int nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size,
+		     void *argv, u32 argc,
+		     struct nvif_mem *mem, u64 offset,
+		     u64 *handle);
+int nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref);
 #endif
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
index 70e7887ef4b4..ec284c1792b3 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/mmu.h
@@ -31,6 +31,16 @@ struct nvkm_vmm {
 
 	u64 start;
 	u64 limit;
+	struct {
+		struct {
+			u64 addr;
+			u64 size;
+		} p;
+		struct {
+			u64 addr;
+			u64 size;
+		} n;
+	} managed;
 
 	struct nvkm_vmm_pt *pd;
 	struct list_head join;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a74ba8d84ba7..186351ecf72f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -350,7 +350,7 @@ nouveau_svmm_init(struct drm_device *dev, void *data,
 	 * VMM instead of the standard one.
 	 */
 	ret = nvif_vmm_ctor(&cli->mmu, "svmVmm",
-			    cli->vmm.vmm.object.oclass, true,
+			    cli->vmm.vmm.object.oclass, MANAGED,
 			    args->unmanaged_addr, args->unmanaged_size,
 			    &(struct gp100_vmm_v0) {
 				.fault_replay = true,
diff --git a/drivers/gpu/drm/nouveau/nouveau_vmm.c b/drivers/gpu/drm/nouveau/nouveau_vmm.c
index 67d6619fcd5e..a6602c012671 100644
--- a/drivers/gpu/drm/nouveau/nouveau_vmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_vmm.c
@@ -128,8 +128,8 @@ nouveau_vmm_fini(struct nouveau_vmm *vmm)
 int
 nouveau_vmm_init(struct nouveau_cli *cli, s32 oclass, struct nouveau_vmm *vmm)
 {
-	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, false, PAGE_SIZE,
-				0, NULL, 0, &vmm->vmm);
+	int ret = nvif_vmm_ctor(&cli->mmu, "drmVmm", oclass, UNMANAGED,
+				PAGE_SIZE, 0, NULL, 0, &vmm->vmm);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nvif/vmm.c b/drivers/gpu/drm/nouveau/nvif/vmm.c
index 6053d6dc2184..a0ca5329b3ef 100644
--- a/drivers/gpu/drm/nouveau/nvif/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvif/vmm.c
@@ -104,6 +104,63 @@ nvif_vmm_get(struct nvif_vmm *vmm, enum nvif_vmm_get type, bool sparse,
 	return ret;
 }
 
+int
+nvif_vmm_raw_unmap(struct nvif_vmm *vmm, u64 handle, bool sparse)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_UNMAP,
+		.handle = handle,
+		.sparse = sparse,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
+int
+nvif_vmm_raw_map(struct nvif_vmm *vmm, u64 addr, u64 size, void *argv, u32 argc,
+		 struct nvif_mem *mem, u64 offset, u64 *handle)
+{
+	struct nvif_vmm_raw_v0 *args;
+	int ret;
+
+	if (!(args = kzalloc(sizeof(*args) + argc, GFP_KERNEL)))
+		return -ENOMEM;
+
+	args->version = 0;
+	args->op = NVIF_VMM_RAW_V0_MAP;
+	args->addr = addr;
+	args->size = size;
+	args->memory = nvif_handle(&mem->object);
+	args->offset = offset;
+	memcpy(args->data, argv, argc);
+
+	ret = nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+			       args, sizeof(*args) + argc);
+
+	if (likely(!ret))
+		*handle = args->handle;
+
+	kfree(args);
+	return ret;
+}
+
+int
+nvif_vmm_raw_sparse(struct nvif_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	struct nvif_vmm_raw_v0 args = {
+		.version = 0,
+		.op = NVIF_VMM_RAW_V0_SPARSE,
+		.addr = addr,
+		.size = size,
+		.ref = ref,
+	};
+
+	return nvif_object_mthd(&vmm->object, NVIF_VMM_V0_RAW,
+				&args, sizeof(args));
+}
+
 void
 nvif_vmm_dtor(struct nvif_vmm *vmm)
 {
@@ -112,8 +169,9 @@ nvif_vmm_dtor(struct nvif_vmm *vmm)
 }
 
 int
-nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
-	      u64 addr, u64 size, void *argv, u32 argc, struct nvif_vmm *vmm)
+nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass,
+	      enum nvif_vmm_type type, u64 addr, u64 size, void *argv, u32 argc,
+	      struct nvif_vmm *vmm)
 {
 	struct nvif_vmm_v0 *args;
 	u32 argn = sizeof(*args) + argc;
@@ -125,9 +183,18 @@ nvif_vmm_ctor(struct nvif_mmu *mmu, const char *name, s32 oclass, bool managed,
 	if (!(args = kmalloc(argn, GFP_KERNEL)))
 		return -ENOMEM;
 	args->version = 0;
-	args->managed = managed;
 	args->addr = addr;
 	args->size = size;
+
+	switch (type) {
+	case UNMANAGED: args->type = NVIF_VMM_V0_TYPE_UNMANAGED; break;
+	case MANAGED: args->type = NVIF_VMM_V0_TYPE_MANAGED; break;
+	case RAW: args->type = NVIF_VMM_V0_TYPE_RAW; break;
+	default:
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
 	memcpy(args->data, argv, argc);
 
 	ret = nvif_object_ctor(&mmu->object, name ? name : "nvifVmm", 0,
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
index 524cd3c0e3fe..c9fac5654baf 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c
@@ -42,6 +42,26 @@ nvkm_uvmm_search(struct nvkm_client *client, u64 handle)
 	return nvkm_vmm_ref(nvkm_uvmm(object)->vmm);
 }
 
+static bool
+nvkm_uvmm_in_managed_range(struct nvkm_uvmm *uvmm, u64 start, u64 size)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+
+	u64 p_start = vmm->managed.p.addr;
+	u64 p_end = p_start + vmm->managed.p.size;
+	u64 n_start = vmm->managed.n.addr;
+	u64 n_end = n_start + vmm->managed.n.size;
+	u64 end = start + size;
+
+	if (start >= p_start && end <= p_end)
+		return true;
+
+	if (start >= n_start && end <= n_end)
+		return true;
+
+	return false;
+}
+
 static int
 nvkm_uvmm_mthd_pfnclr(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 {
@@ -58,6 +78,9 @@ nvkm_uvmm_mthd_pfnclr(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_uvmm_in_managed_range(uvmm, addr, size) && uvmm->raw)
+		return -EINVAL;
+
 	if (size) {
 		mutex_lock(&vmm->mutex);
 		ret = nvkm_vmm_pfn_unmap(vmm, addr, size);
@@ -88,6 +111,9 @@ nvkm_uvmm_mthd_pfnmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_uvmm_in_managed_range(uvmm, addr, size) && uvmm->raw)
+		return -EINVAL;
+
 	if (size) {
 		mutex_lock(&vmm->mutex);
 		ret = nvkm_vmm_pfn_map(vmm, page, addr, size, phys);
@@ -113,6 +139,9 @@ nvkm_uvmm_mthd_unmap(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_uvmm_in_managed_range(uvmm, addr, 0) && uvmm->raw)
+		return -EINVAL;
+
 	mutex_lock(&vmm->mutex);
 	vma = nvkm_vmm_node_search(vmm, addr);
 	if (ret = -ENOENT, !vma || vma->addr != addr) {
@@ -159,6 +188,9 @@ nvkm_uvmm_mthd_map(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	} else
 		return ret;
 
+	if (nvkm_uvmm_in_managed_range(uvmm, addr, size) && uvmm->raw)
+		return -EINVAL;
+
 	memory = nvkm_umem_search(client, handle);
 	if (IS_ERR(memory)) {
 		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
@@ -314,6 +346,131 @@ nvkm_uvmm_mthd_page(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
 	return 0;
 }
 
+static int
+nvkm_uvmm_mthd_raw_map(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args,
+		       void *argv, u32 argc)
+{
+	struct nvkm_client *client = uvmm->object.client;
+	u64 addr, size, handle, offset;
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	struct nvkm_vma *vma;
+	struct nvkm_memory *memory;
+	int ret;
+
+	addr = args->addr;
+	size = args->size;
+	handle = args->memory;
+	offset = args->offset;
+
+	if (!nvkm_uvmm_in_managed_range(uvmm, addr, size))
+		return -EINVAL;
+
+	memory = nvkm_umem_search(client, handle);
+	if (IS_ERR(memory)) {
+		VMM_DEBUG(vmm, "memory %016llx %ld\n", handle, PTR_ERR(memory));
+		return PTR_ERR(memory);
+	}
+
+	vma = nvkm_vma_new(addr, size);
+	if (!vma)
+		return -ENOMEM;
+
+	vma->mapref = true;
+	vma->used = true;
+
+	mutex_lock(&vmm->mutex);
+	if (ret = -ENOENT, vma->busy) {
+		VMM_DEBUG(vmm, "denied %016llx: %d", addr, vma->busy);
+		goto fail;
+	}
+	vma->busy = true;
+	mutex_unlock(&vmm->mutex);
+
+	ret = nvkm_memory_map(memory, offset, vmm, vma, argv, argc);
+	if (ret == 0) {
+		/* Successful map will clear vma->busy. */
+		args->handle = (u64)(uintptr_t)vma;
+		nvkm_memory_unref(&memory);
+		return 0;
+	}
+
+	mutex_lock(&vmm->mutex);
+	nvkm_memory_tags_put(vma->memory, vmm->mmu->subdev.device, &vma->tags);
+	nvkm_memory_unref(&vma->memory);
+	kfree(vma);
+fail:
+	mutex_unlock(&vmm->mutex);
+	nvkm_memory_unref(&memory);
+	return ret;
+}
+
+static int
+nvkm_uvmm_mthd_raw_unmap(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	struct nvkm_vma *vma;
+
+	vma = (struct nvkm_vma *)args->handle;
+	if (!vma)
+		return -EINVAL;
+
+	mutex_lock(&vmm->mutex);
+	if (vma->busy) {
+		VMM_DEBUG(vmm, "denied %016llx: %d", vma->addr, vma->busy);
+		mutex_unlock(&vmm->mutex);
+		return -ENOENT;
+	}
+	vma->sparse = args->sparse;
+	nvkm_vmm_raw_unmap_locked(vmm, vma);
+	mutex_unlock(&vmm->mutex);
+
+	args->handle = 0;
+	kfree(vma);
+	return 0;
+}
+
+static int
+nvkm_uvmm_mthd_raw_sparse(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
+{
+	struct nvkm_vmm *vmm = uvmm->vmm;
+	int ret;
+
+	if (!nvkm_uvmm_in_managed_range(uvmm, args->addr, args->size))
+		return -EINVAL;
+
+	mutex_lock(&vmm->mutex);
+	ret = nvkm_vmm_raw_sparse_locked(vmm, args->addr, args->size, args->ref);
+	mutex_unlock(&vmm->mutex);
+
+	return ret;
+}
+
+static int
+nvkm_uvmm_mthd_raw(struct nvkm_uvmm *uvmm, void *argv, u32 argc)
+{
+	union {
+		struct nvif_vmm_raw_v0 v0;
+	} *args = argv;
+	int ret = -ENOSYS;
+
+	if (!uvmm->raw)
+		return -EINVAL;
+
+	if ((ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, true)))
+		return ret;
+
+	switch (args->v0.op) {
+	case NVIF_VMM_RAW_V0_MAP:
+		return nvkm_uvmm_mthd_raw_map(uvmm, &args->v0, argv, argc);
+	case NVIF_VMM_RAW_V0_UNMAP:
+		return nvkm_uvmm_mthd_raw_unmap(uvmm, &args->v0);
+	case NVIF_VMM_RAW_V0_SPARSE:
+		return nvkm_uvmm_mthd_raw_sparse(uvmm, &args->v0);
+	default:
+		return -EINVAL;
+	};
+}
+
 static int
 nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 {
@@ -326,6 +483,7 @@ nvkm_uvmm_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
 	case NVIF_VMM_V0_UNMAP : return nvkm_uvmm_mthd_unmap (uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNMAP: return nvkm_uvmm_mthd_pfnmap(uvmm, argv, argc);
 	case NVIF_VMM_V0_PFNCLR: return nvkm_uvmm_mthd_pfnclr(uvmm, argv, argc);
+	case NVIF_VMM_V0_RAW   : return nvkm_uvmm_mthd_raw   (uvmm, argv, argc);
 	case NVIF_VMM_V0_MTHD(0x00) ... NVIF_VMM_V0_MTHD(0x7f):
 		if (uvmm->vmm->func->mthd) {
 			return uvmm->vmm->func->mthd(uvmm->vmm,
@@ -366,10 +524,11 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 	struct nvkm_uvmm *uvmm;
 	int ret = -ENOSYS;
 	u64 addr, size;
-	bool managed;
+	bool managed, raw;
 
 	if (!(ret = nvif_unpack(ret, &argv, &argc, args->v0, 0, 0, more))) {
-		managed = args->v0.managed != 0;
+		managed = args->v0.type == NVIF_VMM_V0_TYPE_MANAGED;
+		raw = args->v0.type == NVIF_VMM_V0_TYPE_RAW;
 		addr = args->v0.addr;
 		size = args->v0.size;
 	} else
@@ -377,12 +536,13 @@ nvkm_uvmm_new(const struct nvkm_oclass *oclass, void *argv, u32 argc,
 
 	if (!(uvmm = kzalloc(sizeof(*uvmm), GFP_KERNEL)))
 		return -ENOMEM;
+	uvmm->raw = raw;
 	nvkm_object_ctor(&nvkm_uvmm, oclass, &uvmm->object);
 	*pobject = &uvmm->object;
 
 	if (!mmu->vmm) {
-		ret = mmu->func->vmm.ctor(mmu, managed, addr, size, argv, argc,
-					  NULL, "user", &uvmm->vmm);
+		ret = mmu->func->vmm.ctor(mmu, managed || raw, addr, size,
+					  argv, argc, NULL, "user", &uvmm->vmm);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h
index 71dab55e18a9..7f6fb1fb46bd 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h
@@ -7,6 +7,7 @@
 struct nvkm_uvmm {
 	struct nvkm_object object;
 	struct nvkm_vmm *vmm;
+	bool raw;
 };
 
 int nvkm_uvmm_new(const struct nvkm_oclass *, void *argv, u32 argc,
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
index ae793f400ba1..255ab920cb15 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -744,7 +744,7 @@ nvkm_vmm_ptes_get(struct nvkm_vmm *vmm, const struct nvkm_vmm_page *page,
 	return 0;
 }
 
-static inline struct nvkm_vma *
+struct nvkm_vma *
 nvkm_vma_new(u64 addr, u64 size)
 {
 	struct nvkm_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
@@ -1101,6 +1101,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		if (addr && (ret = nvkm_vmm_ctor_managed(vmm, 0, addr)))
 			return ret;
 
+		vmm->managed.p.addr = 0;
+		vmm->managed.p.size = addr;
+
 		/* NVKM-managed area. */
 		if (size) {
 			if (!(vma = nvkm_vma_new(addr, size)))
@@ -1114,6 +1117,9 @@ nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
 		size = vmm->limit - addr;
 		if (size && (ret = nvkm_vmm_ctor_managed(vmm, addr, size)))
 			return ret;
+
+		vmm->managed.n.addr = addr;
+		vmm->managed.n.size = size;
 	} else {
 		/* Address-space fully managed by NVKM, requiring calls to
 		 * nvkm_vmm_get()/nvkm_vmm_put() to allocate address-space.
@@ -1326,6 +1332,19 @@ nvkm_vmm_pfn_map(struct nvkm_vmm *vmm, u8 shift, u64 addr, u64 size, u64 *pfn)
 	return 0;
 }
 
+void
+nvkm_vmm_raw_unmap_locked(struct nvkm_vmm *vmm, struct nvkm_vma *vma)
+{
+	const struct nvkm_vmm_page *page = &vmm->func->page[vma->refd];
+
+	nvkm_vmm_ptes_unmap_put(vmm, page, vma->addr, vma->size, vma->sparse, false);
+	vma->refd = NVKM_VMA_PAGE_NONE;
+
+	nvkm_memory_tags_put(vma->memory, vmm->mmu->subdev.device, &vma->tags);
+	nvkm_memory_unref(&vma->memory);
+	vma->mapped = false;
+}
+
 void
 nvkm_vmm_unmap_region(struct nvkm_vmm *vmm, struct nvkm_vma *vma)
 {
@@ -1775,6 +1794,17 @@ nvkm_vmm_get(struct nvkm_vmm *vmm, u8 page, u64 size, struct nvkm_vma **pvma)
 	return ret;
 }
 
+int nvkm_vmm_raw_sparse_locked(struct nvkm_vmm *vmm, u64 addr, u64 size, bool ref)
+{
+	int ret;
+
+	ret = nvkm_vmm_ptes_sparse(vmm, addr, size, ref);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 void
 nvkm_vmm_part(struct nvkm_vmm *vmm, struct nvkm_memory *inst)
 {
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
index f6188aa9171c..7bb1905b70f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
@@ -163,6 +163,7 @@ int nvkm_vmm_new_(const struct nvkm_vmm_func *, struct nvkm_mmu *,
 		  u32 pd_header, bool managed, u64 addr, u64 size,
 		  struct lock_class_key *, const char *name,
 		  struct nvkm_vmm **);
+struct nvkm_vma *nvkm_vma_new(u64 addr, u64 size);
 struct nvkm_vma *nvkm_vmm_node_search(struct nvkm_vmm *, u64 addr);
 struct nvkm_vma *nvkm_vmm_node_split(struct nvkm_vmm *, struct nvkm_vma *,
 				     u64 addr, u64 size);
@@ -172,6 +173,8 @@ int nvkm_vmm_get_locked(struct nvkm_vmm *, bool getref, bool mapref,
 void nvkm_vmm_put_locked(struct nvkm_vmm *, struct nvkm_vma *);
 void nvkm_vmm_unmap_locked(struct nvkm_vmm *, struct nvkm_vma *, bool pfn);
 void nvkm_vmm_unmap_region(struct nvkm_vmm *, struct nvkm_vma *);
+void nvkm_vmm_raw_unmap_locked(struct nvkm_vmm *vmm, struct nvkm_vma *vma);
+int nvkm_vmm_raw_sparse_locked(struct nvkm_vmm *, u64 addr, u64 size, bool ref);
 
 #define NVKM_VMM_PFN_ADDR                                 0xfffffffffffff000ULL
 #define NVKM_VMM_PFN_ADDR_SHIFT                                              12
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (10 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

uvmm provides the driver abstraction around the DRM GPU VA manager
connecting it to the nouveau infrastructure.

It handles the split and merge operations provided by the DRM GPU VA
manager for map operations colliding with existent mappings and takes
care of the driver specific locking around the DRM GPU VA manager.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/Kbuild          |   1 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c |   7 +
 drivers/gpu/drm/nouveau/nouveau_bo.c    | 147 +++---
 drivers/gpu/drm/nouveau/nouveau_bo.h    |   2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c   |   2 +
 drivers/gpu/drm/nouveau/nouveau_drv.h   |  48 ++
 drivers/gpu/drm/nouveau/nouveau_gem.c   |  51 ++-
 drivers/gpu/drm/nouveau/nouveau_mem.h   |   5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c |   2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c  | 575 ++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h  |  68 +++
 11 files changed, 835 insertions(+), 73 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h

diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
index 5e5617006da5..ee281bb76463 100644
--- a/drivers/gpu/drm/nouveau/Kbuild
+++ b/drivers/gpu/drm/nouveau/Kbuild
@@ -47,6 +47,7 @@ nouveau-y += nouveau_prime.o
 nouveau-y += nouveau_sgdma.o
 nouveau-y += nouveau_ttm.o
 nouveau-y += nouveau_vmm.o
+nouveau-y += nouveau_uvmm.o
 
 # DRM - modesetting
 nouveau-$(CONFIG_DRM_NOUVEAU_BACKLIGHT) += nouveau_backlight.o
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 82dab51d8aeb..36cc80eb0e20 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -261,6 +261,13 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 	if (!drm->channel)
 		return nouveau_abi16_put(abi16, -ENODEV);
 
+	/* If uvmm wasn't initialized until now disable it completely to prevent
+	 * userspace from mixing up UAPIs.
+	 *
+	 * The client lock is already acquired by nouveau_abi16_get().
+	 */
+	__nouveau_cli_uvmm_disable(cli);
+
 	device = &abi16->device;
 	engine = NV_DEVICE_HOST_RUNLIST_ENGINES_GR;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 4cdeda7fe2df..03bbee291fc9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -199,7 +199,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, int *align, u64 *size)
 
 struct nouveau_bo *
 nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
-		 u32 tile_mode, u32 tile_flags)
+		 u32 tile_mode, u32 tile_flags, bool internal)
 {
 	struct nouveau_drm *drm = cli->drm;
 	struct nouveau_bo *nvbo;
@@ -235,68 +235,103 @@ nouveau_bo_alloc(struct nouveau_cli *cli, u64 *size, int *align, u32 domain,
 			nvbo->force_coherent = true;
 	}
 
-	if (cli->device.info.family >= NV_DEVICE_INFO_V0_FERMI) {
-		nvbo->kind = (tile_flags & 0x0000ff00) >> 8;
-		if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
-			kfree(nvbo);
-			return ERR_PTR(-EINVAL);
+	nvbo->contig = !(tile_flags & NOUVEAU_GEM_TILE_NONCONTIG);
+	if (!nouveau_cli_uvmm(cli) || internal) {
+		/* for BO noVM allocs, don't assign kinds */
+		if (cli->device.info.family >= NV_DEVICE_INFO_V0_FERMI) {
+			nvbo->kind = (tile_flags & 0x0000ff00) >> 8;
+			if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+				kfree(nvbo);
+				return ERR_PTR(-EINVAL);
+			}
+
+			nvbo->comp = mmu->kind[nvbo->kind] != nvbo->kind;
+		} else if (cli->device.info.family >= NV_DEVICE_INFO_V0_TESLA) {
+			nvbo->kind = (tile_flags & 0x00007f00) >> 8;
+			nvbo->comp = (tile_flags & 0x00030000) >> 16;
+			if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+				kfree(nvbo);
+				return ERR_PTR(-EINVAL);
+			}
+		} else {
+			nvbo->zeta = (tile_flags & 0x00000007);
 		}
+		nvbo->mode = tile_mode;
+
+		/* Determine the desirable target GPU page size for the buffer. */
+		for (i = 0; i < vmm->page_nr; i++) {
+			/* Because we cannot currently allow VMM maps to fail
+			 * during buffer migration, we need to determine page
+			 * size for the buffer up-front, and pre-allocate its
+			 * page tables.
+			 *
+			 * Skip page sizes that can't support needed domains.
+			 */
+			if (cli->device.info.family > NV_DEVICE_INFO_V0_CURIE &&
+			    (domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
+				continue;
+			if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
+			    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
+				continue;
 
-		nvbo->comp = mmu->kind[nvbo->kind] != nvbo->kind;
-	} else
-	if (cli->device.info.family >= NV_DEVICE_INFO_V0_TESLA) {
-		nvbo->kind = (tile_flags & 0x00007f00) >> 8;
-		nvbo->comp = (tile_flags & 0x00030000) >> 16;
-		if (!nvif_mmu_kind_valid(mmu, nvbo->kind)) {
+			/* Select this page size if it's the first that supports
+			 * the potential memory domains, or when it's compatible
+			 * with the requested compression settings.
+			 */
+			if (pi < 0 || !nvbo->comp || vmm->page[i].comp)
+				pi = i;
+
+			/* Stop once the buffer is larger than the current page size. */
+			if (*size >= 1ULL << vmm->page[i].shift)
+				break;
+		}
+
+		if (WARN_ON(pi < 0)) {
 			kfree(nvbo);
 			return ERR_PTR(-EINVAL);
 		}
-	} else {
-		nvbo->zeta = (tile_flags & 0x00000007);
-	}
-	nvbo->mode = tile_mode;
-	nvbo->contig = !(tile_flags & NOUVEAU_GEM_TILE_NONCONTIG);
-
-	/* Determine the desirable target GPU page size for the buffer. */
-	for (i = 0; i < vmm->page_nr; i++) {
-		/* Because we cannot currently allow VMM maps to fail
-		 * during buffer migration, we need to determine page
-		 * size for the buffer up-front, and pre-allocate its
-		 * page tables.
-		 *
-		 * Skip page sizes that can't support needed domains.
-		 */
-		if (cli->device.info.family > NV_DEVICE_INFO_V0_CURIE &&
-		    (domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
-			continue;
-		if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
-		    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
-			continue;
 
-		/* Select this page size if it's the first that supports
-		 * the potential memory domains, or when it's compatible
-		 * with the requested compression settings.
-		 */
-		if (pi < 0 || !nvbo->comp || vmm->page[i].comp)
-			pi = i;
-
-		/* Stop once the buffer is larger than the current page size. */
-		if (*size >= 1ULL << vmm->page[i].shift)
-			break;
-	}
+		/* Disable compression if suitable settings couldn't be found. */
+		if (nvbo->comp && !vmm->page[pi].comp) {
+			if (mmu->object.oclass >= NVIF_CLASS_MMU_GF100)
+				nvbo->kind = mmu->kind[nvbo->kind];
+			nvbo->comp = 0;
+		}
+		nvbo->page = vmm->page[pi].shift;
+	} else {
+		/* reject other tile flags when in VM mode. */
+		if (tile_mode)
+			return ERR_PTR(-EINVAL);
+		if (tile_flags & ~NOUVEAU_GEM_TILE_NONCONTIG)
+			return ERR_PTR(-EINVAL);
 
-	if (WARN_ON(pi < 0)) {
-		kfree(nvbo);
-		return ERR_PTR(-EINVAL);
-	}
+		/* Determine the desirable target GPU page size for the buffer. */
+		for (i = 0; i < vmm->page_nr; i++) {
+			/* Because we cannot currently allow VMM maps to fail
+			 * during buffer migration, we need to determine page
+			 * size for the buffer up-front, and pre-allocate its
+			 * page tables.
+			 *
+			 * Skip page sizes that can't support needed domains.
+			 */
+			if ((domain & NOUVEAU_GEM_DOMAIN_VRAM) && !vmm->page[i].vram)
+				continue;
+			if ((domain & NOUVEAU_GEM_DOMAIN_GART) &&
+			    (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
+				continue;
 
-	/* Disable compression if suitable settings couldn't be found. */
-	if (nvbo->comp && !vmm->page[pi].comp) {
-		if (mmu->object.oclass >= NVIF_CLASS_MMU_GF100)
-			nvbo->kind = mmu->kind[nvbo->kind];
-		nvbo->comp = 0;
+			if (pi < 0)
+				pi = i;
+			/* Stop once the buffer is larger than the current page size. */
+			if (*size >= 1ULL << vmm->page[i].shift)
+				break;
+		}
+		if (WARN_ON(pi < 0)) {
+			kfree(nvbo);
+			return ERR_PTR(-EINVAL);
+		}
+		nvbo->page = vmm->page[pi].shift;
 	}
-	nvbo->page = vmm->page[pi].shift;
 
 	nouveau_bo_fixup_align(nvbo, align, size);
 
@@ -334,7 +369,7 @@ nouveau_bo_new(struct nouveau_cli *cli, u64 size, int align,
 	int ret;
 
 	nvbo = nouveau_bo_alloc(cli, &size, &align, domain, tile_mode,
-				tile_flags);
+				tile_flags, true);
 	if (IS_ERR(nvbo))
 		return PTR_ERR(nvbo);
 
@@ -937,11 +972,13 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object *bo,
 		list_for_each_entry(vma, &nvbo->vma_list, head) {
 			nouveau_vma_map(vma, mem);
 		}
+		nouveau_uvmm_bo_map_all(nvbo, mem);
 	} else {
 		list_for_each_entry(vma, &nvbo->vma_list, head) {
 			WARN_ON(ttm_bo_wait(bo, false, false));
 			nouveau_vma_unmap(vma);
 		}
+		nouveau_uvmm_bo_unmap_all(nvbo);
 	}
 
 	if (new_reg)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index 774dd93ca76b..cb85207d9e8f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -73,7 +73,7 @@ extern struct ttm_device_funcs nouveau_bo_driver;
 
 void nouveau_bo_move_init(struct nouveau_drm *);
 struct nouveau_bo *nouveau_bo_alloc(struct nouveau_cli *, u64 *size, int *align,
-				    u32 domain, u32 tile_mode, u32 tile_flags);
+				    u32 domain, u32 tile_mode, u32 tile_flags, bool internal);
 int  nouveau_bo_init(struct nouveau_bo *, u64 size, int align, u32 domain,
 		     struct sg_table *sg, struct dma_resv *robj);
 int  nouveau_bo_new(struct nouveau_cli *, u64 size, int align, u32 domain,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 80f154b6adab..989f30a31ba9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -70,6 +70,7 @@
 #include "nouveau_platform.h"
 #include "nouveau_svm.h"
 #include "nouveau_dmem.h"
+#include "nouveau_uvmm.h"
 
 DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
 			"DRM_UT_CORE",
@@ -192,6 +193,7 @@ nouveau_cli_fini(struct nouveau_cli *cli)
 	WARN_ON(!list_empty(&cli->worker));
 
 	usif_client_fini(cli);
+	nouveau_uvmm_fini(&cli->uvmm);
 	nouveau_vmm_fini(&cli->svm);
 	nouveau_vmm_fini(&cli->vmm);
 	nvif_mmu_dtor(&cli->mmu);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 20a7f31b9082..d634f1054d65 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -64,6 +64,7 @@ struct platform_device;
 #include "nouveau_fence.h"
 #include "nouveau_bios.h"
 #include "nouveau_vmm.h"
+#include "nouveau_uvmm.h"
 
 struct nouveau_drm_tile {
 	struct nouveau_fence *fence;
@@ -91,6 +92,8 @@ struct nouveau_cli {
 	struct nvif_mmu mmu;
 	struct nouveau_vmm vmm;
 	struct nouveau_vmm svm;
+	struct nouveau_uvmm uvmm;
+
 	const struct nvif_mclass *mem;
 
 	struct list_head head;
@@ -112,15 +115,60 @@ struct nouveau_cli_work {
 	struct dma_fence_cb cb;
 };
 
+static inline struct nouveau_uvmm *
+nouveau_cli_uvmm(struct nouveau_cli *cli)
+{
+	if (!cli || !cli->uvmm.vmm.cli)
+		return NULL;
+
+	return &cli->uvmm;
+}
+
+static inline struct nouveau_uvmm *
+nouveau_cli_uvmm_locked(struct nouveau_cli *cli)
+{
+	struct nouveau_uvmm *uvmm;
+
+	mutex_lock(&cli->mutex);
+	uvmm = nouveau_cli_uvmm(cli);
+	mutex_unlock(&cli->mutex);
+
+	return uvmm;
+}
+
 static inline struct nouveau_vmm *
 nouveau_cli_vmm(struct nouveau_cli *cli)
 {
+	struct nouveau_uvmm *uvmm;
+
+	uvmm = nouveau_cli_uvmm(cli);
+	if (uvmm)
+		return &uvmm->vmm;
+
 	if (cli->svm.cli)
 		return &cli->svm;
 
 	return &cli->vmm;
 }
 
+static inline void
+__nouveau_cli_uvmm_disable(struct nouveau_cli *cli)
+{
+	struct nouveau_uvmm *uvmm;
+
+	uvmm = nouveau_cli_uvmm(cli);
+	if (!uvmm)
+		cli->uvmm.disabled = true;
+}
+
+static inline void
+nouveau_cli_uvmm_disable(struct nouveau_cli *cli)
+{
+	mutex_lock(&cli->mutex);
+	__nouveau_cli_uvmm_disable(cli);
+	mutex_unlock(&cli->mutex);
+}
+
 void nouveau_cli_work_queue(struct nouveau_cli *, struct dma_fence *,
 			    struct nouveau_cli_work *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 5dad2d0dd5cb..3370a73e6a9b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -120,7 +120,11 @@ nouveau_gem_object_open(struct drm_gem_object *gem, struct drm_file *file_priv)
 		goto out;
 	}
 
-	ret = nouveau_vma_new(nvbo, vmm, &vma);
+	/* only create a VMA on binding */
+	if (!nouveau_cli_uvmm(cli))
+		ret = nouveau_vma_new(nvbo, vmm, &vma);
+	else
+		ret = 0;
 	pm_runtime_mark_last_busy(dev);
 	pm_runtime_put_autosuspend(dev);
 out:
@@ -180,6 +184,7 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
 	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
 	struct device *dev = drm->dev->dev;
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
 	struct nouveau_vmm *vmm = nouveau_cli_vmm(cli);
 	struct nouveau_vma *vma;
 	int ret;
@@ -187,22 +192,26 @@ nouveau_gem_object_close(struct drm_gem_object *gem, struct drm_file *file_priv)
 	if (vmm->vmm.object.oclass < NVIF_CLASS_VMM_NV50)
 		return;
 
-	ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
-	if (ret)
-		return;
+	if (uvmm) {
+		nouveau_uvmm_cli_unmap_all(uvmm, gem);
+	} else {
+		ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);
+		if (ret)
+			return;
 
-	vma = nouveau_vma_find(nvbo, vmm);
-	if (vma) {
-		if (--vma->refs == 0) {
-			ret = pm_runtime_get_sync(dev);
-			if (!WARN_ON(ret < 0 && ret != -EACCES)) {
-				nouveau_gem_object_unmap(nvbo, vma);
-				pm_runtime_mark_last_busy(dev);
+		vma = nouveau_vma_find(nvbo, vmm);
+		if (vma) {
+			if (--vma->refs == 0) {
+				ret = pm_runtime_get_sync(dev);
+				if (!WARN_ON(ret < 0 && ret != -EACCES)) {
+					nouveau_gem_object_unmap(nvbo, vma);
+					pm_runtime_mark_last_busy(dev);
+				}
+				pm_runtime_put_autosuspend(dev);
 			}
-			pm_runtime_put_autosuspend(dev);
 		}
+		ttm_bo_unreserve(&nvbo->bo);
 	}
-	ttm_bo_unreserve(&nvbo->bo);
 }
 
 const struct drm_gem_object_funcs nouveau_gem_object_funcs = {
@@ -231,7 +240,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
 		domain |= NOUVEAU_GEM_DOMAIN_CPU;
 
 	nvbo = nouveau_bo_alloc(cli, &size, &align, domain, tile_mode,
-				tile_flags);
+				tile_flags, false);
 	if (IS_ERR(nvbo))
 		return PTR_ERR(nvbo);
 
@@ -279,13 +288,15 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
 	else
 		rep->domain = NOUVEAU_GEM_DOMAIN_VRAM;
 	rep->offset = nvbo->offset;
-	if (vmm->vmm.object.oclass >= NVIF_CLASS_VMM_NV50) {
+	if (vmm->vmm.object.oclass >= NVIF_CLASS_VMM_NV50 &&
+	    !nouveau_cli_uvmm(cli)) {
 		vma = nouveau_vma_find(nvbo, vmm);
 		if (!vma)
 			return -EINVAL;
 
 		rep->offset = vma->addr;
-	}
+	} else
+		rep->offset = 0;
 
 	rep->size = nvbo->bo.base.size;
 	rep->map_handle = drm_vma_node_offset_addr(&nvbo->bo.base.vma_node);
@@ -310,6 +321,11 @@ nouveau_gem_ioctl_new(struct drm_device *dev, void *data,
 	struct nouveau_bo *nvbo = NULL;
 	int ret = 0;
 
+	/* If uvmm wasn't initialized until now disable it completely to prevent
+	 * userspace from mixing up UAPIs.
+	 */
+	nouveau_cli_uvmm_disable(cli);
+
 	ret = nouveau_gem_new(cli, req->info.size, req->align,
 			      req->info.domain, req->info.tile_mode,
 			      req->info.tile_flags, &nvbo);
@@ -710,6 +726,9 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 	if (unlikely(!abi16))
 		return -ENOMEM;
 
+	if (unlikely(nouveau_cli_uvmm(cli)))
+		return -ENOSYS;
+
 	list_for_each_entry(temp, &abi16->channels, head) {
 		if (temp->chan->chid == req->channel) {
 			chan = temp->chan;
diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.h b/drivers/gpu/drm/nouveau/nouveau_mem.h
index 76c86d8bb01e..5365a3d3a17f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.h
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.h
@@ -35,4 +35,9 @@ int nouveau_mem_vram(struct ttm_resource *, bool contig, u8 page);
 int nouveau_mem_host(struct ttm_resource *, struct ttm_tt *);
 void nouveau_mem_fini(struct nouveau_mem *);
 int nouveau_mem_map(struct nouveau_mem *, struct nvif_vmm *, struct nvif_vma *);
+int
+nouveau_mem_map_fixed(struct nouveau_mem *mem,
+		      struct nvif_vmm *vmm,
+		      u8 kind, u64 addr,
+		      u64 offset, u64 range);
 #endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
index f42c2b1b0363..6a883b9a799a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -50,7 +50,7 @@ struct drm_gem_object *nouveau_gem_prime_import_sg_table(struct drm_device *dev,
 
 	dma_resv_lock(robj, NULL);
 	nvbo = nouveau_bo_alloc(&drm->client, &size, &align,
-				NOUVEAU_GEM_DOMAIN_GART, 0, 0);
+				NOUVEAU_GEM_DOMAIN_GART, 0, 0, true);
 	if (IS_ERR(nvbo)) {
 		obj = ERR_CAST(nvbo);
 		goto unlock;
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
new file mode 100644
index 000000000000..47a74e3ce882
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+/*
+ * Locking:
+ *
+ * The uvmm mutex protects any operations on the GPU VA space provided by the
+ * DRM GPU VA manager.
+ *
+ * The DRM GEM GPUVA lock protects a GEM's GPUVA list. It also protects single
+ * map/unmap operations against a BO move, which itself walks the GEM's GPUVA
+ * list in order to map/unmap it's entries.
+ *
+ * We'd also need to protect the DRM_GPUVA_SWAPPED flag for each individual
+ * GPUVA, however this isn't necessary since any read or write to this flag
+ * happens when we already took the DRM GEM GPUVA lock of the backing GEM of
+ * the particular GPUVA.
+ */
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_uvmm.h"
+
+#include <nvif/vmm.h>
+#include <nvif/mem.h>
+
+#include <nvif/class.h>
+#include <nvif/if000c.h>
+#include <nvif/if900d.h>
+
+#define NOUVEAU_VA_SPACE_BITS		47 /* FIXME */
+#define NOUVEAU_VA_SPACE_START		0x0
+#define NOUVEAU_VA_SPACE_END		(1ULL << NOUVEAU_VA_SPACE_BITS)
+
+struct nouveau_uvmm_map_args {
+	u8 kind;
+	bool swapped;
+};
+
+int
+nouveau_uvmm_validate_range(struct nouveau_uvmm *uvmm, u64 addr, u64 range)
+{
+	u64 end = addr + range;
+	u64 unmanaged_end = uvmm->unmanaged_addr +
+			    uvmm->unmanaged_size;
+
+	if (addr & ~PAGE_MASK)
+		return -EINVAL;
+
+	if (range & ~PAGE_MASK)
+		return -EINVAL;
+
+	if (end <= addr)
+		return -EINVAL;
+
+	if (addr < NOUVEAU_VA_SPACE_START ||
+	    end > NOUVEAU_VA_SPACE_END)
+		return -EINVAL;
+
+	if (addr < unmanaged_end &&
+	    end > uvmm->unmanaged_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int
+nouveau_uvma_map(struct nouveau_uvma *uvma,
+		 struct nouveau_mem *mem)
+{
+	struct nvif_vmm *vmm = &uvma->uvmm->vmm.vmm;
+	u64 addr = uvma->va.node.start << PAGE_SHIFT;
+	u64 offset = uvma->va.gem.offset << PAGE_SHIFT;
+	u64 range = uvma->va.node.size << PAGE_SHIFT;
+	union {
+		struct gf100_vmm_map_v0 gf100;
+	} args;
+	u32 argc = 0;
+
+	switch (vmm->object.oclass) {
+	case NVIF_CLASS_VMM_GF100:
+	case NVIF_CLASS_VMM_GM200:
+	case NVIF_CLASS_VMM_GP100:
+		args.gf100.version = 0;
+		if (mem->mem.type & NVIF_MEM_VRAM)
+			args.gf100.vol = 0;
+		else
+			args.gf100.vol = 1;
+		args.gf100.ro = 0;
+		args.gf100.priv = 0;
+		args.gf100.kind = uvma->kind;
+		argc = sizeof(args.gf100);
+		break;
+	default:
+		WARN_ON(1);
+		return -ENOSYS;
+	}
+
+	return nvif_vmm_raw_map(vmm, addr, range,
+				&args, argc,
+				&mem->mem, offset,
+				&uvma->handle);
+}
+
+static int
+nouveau_uvma_unmap(struct nouveau_uvma *uvma)
+{
+	struct nvif_vmm *vmm = &uvma->uvmm->vmm.vmm;
+	bool sparse = uvma->va.region->sparse;
+
+	if (drm_gpuva_swapped(&uvma->va))
+		return 0;
+
+	return nvif_vmm_raw_unmap(vmm, uvma->handle, sparse);
+}
+
+static void
+nouveau_uvma_destroy(struct nouveau_uvma *uvma)
+{
+	drm_gpuva_destroy_locked(&uvma->va);
+	kfree(uvma);
+}
+
+void
+nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbo, struct nouveau_mem *mem)
+{
+	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuva *va;
+
+	drm_gem_gpuva_lock(obj);
+	drm_gem_for_each_gpuva(va, obj) {
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+
+		nouveau_uvma_map(uvma, mem);
+		drm_gpuva_swap(va, false);
+	}
+	drm_gem_gpuva_unlock(obj);
+}
+
+void
+nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo)
+{
+	struct drm_gem_object *obj = &nvbo->bo.base;
+	struct drm_gpuva *va;
+
+	drm_gem_gpuva_lock(obj);
+	drm_gem_for_each_gpuva(va, obj) {
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+
+		nouveau_uvma_unmap(uvma);
+		drm_gpuva_swap(va, true);
+	}
+	drm_gem_gpuva_unlock(obj);
+}
+
+void
+nouveau_uvmm_cli_unmap_all(struct nouveau_uvmm *uvmm,
+			   struct drm_gem_object *obj)
+{
+	struct drm_gpuva *va, *tmp;
+
+	nouveau_uvmm_lock(uvmm);
+	drm_gem_gpuva_lock(obj);
+	drm_gem_for_each_gpuva_safe(va, tmp, obj) {
+		struct nouveau_uvma *uvma = uvma_from_va(va);
+
+		if (&uvmm->umgr == va->mgr) {
+			nouveau_uvma_unmap(uvma);
+			nouveau_uvma_destroy(uvma);
+		}
+	}
+	drm_gem_gpuva_unlock(obj);
+	nouveau_uvmm_unlock(uvmm);
+}
+
+static void
+nouveau_uvmm_unmap_range(struct nouveau_uvmm *uvmm,
+			 u64 addr, u64 range)
+{
+	struct drm_gpuva *va, *next;
+	u64 end = addr + range;
+
+	addr >>= PAGE_SHIFT;
+	range >>= PAGE_SHIFT;
+	end  >>= PAGE_SHIFT;
+
+	drm_gpuva_for_each_va_safe(va, next, &uvmm->umgr) {
+		if (addr >= va->node.start &&
+		    end <= va->node.start + va->node.size) {
+			struct nouveau_uvma *uvma = uvma_from_va(va);
+			struct drm_gem_object *obj = va->gem.obj;
+
+			drm_gem_gpuva_lock(obj);
+			nouveau_uvma_unmap(uvma);
+			nouveau_uvma_destroy(uvma);
+			drm_gem_gpuva_unlock(obj);
+		}
+	}
+}
+
+static int
+nouveau_uvma_new(struct nouveau_uvmm *uvmm,
+		 struct drm_gem_object *obj,
+		 u64 bo_offset, u64 addr,
+		 u64 range, u8 kind,
+		 struct nouveau_uvma **puvma)
+{
+	struct nouveau_uvma *uvma;
+	int ret;
+
+	addr >>= PAGE_SHIFT;
+	bo_offset >>= PAGE_SHIFT;
+	range >>= PAGE_SHIFT;
+
+	uvma = *puvma = kzalloc(sizeof(*uvma), GFP_KERNEL);
+	if (!uvma)
+		return -ENOMEM;
+
+	uvma->uvmm = uvmm;
+	uvma->kind = kind;
+	uvma->va.gem.offset = bo_offset;
+	uvma->va.gem.obj = obj;
+
+	ret = drm_gpuva_insert(&uvmm->umgr, &uvma->va, addr, range);
+	if (ret) {
+		kfree(uvma);
+		*puvma = NULL;
+		return ret;
+	}
+	drm_gpuva_link_locked(&uvma->va);
+
+	return 0;
+}
+
+int
+nouveau_uvma_region_new(struct nouveau_uvmm *uvmm,
+			u64 addr, u64 range,
+			bool sparse)
+{
+	struct nouveau_uvma_region *reg;
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+	int ret;
+
+	reg = kzalloc(sizeof(*reg), GFP_KERNEL);
+	if (!reg)
+		return -ENOMEM;
+
+	reg->uvmm = uvmm;
+	reg->region.sparse = sparse;
+
+	ret = drm_gpuva_region_insert(&uvmm->umgr, &reg->region,
+				      addr >> PAGE_SHIFT,
+				      range >> PAGE_SHIFT);
+	if (ret)
+		goto err_free_region;
+
+	if (sparse) {
+		ret = nvif_vmm_raw_sparse(vmm, addr, range, true);
+		if (ret)
+			goto err_destroy_region;
+	}
+
+	return 0;
+
+err_destroy_region:
+	drm_gpuva_region_destroy(&uvmm->umgr, &reg->region);
+err_free_region:
+	kfree(reg);
+	return ret;
+}
+
+static void
+__nouveau_uvma_region_destroy(struct nouveau_uvma_region *reg)
+{
+	struct nouveau_uvmm *uvmm = reg->uvmm;
+	struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+	u64 addr = reg->region.node.start << PAGE_SHIFT;
+	u64 range = reg->region.node.size << PAGE_SHIFT;
+
+	nouveau_uvmm_unmap_range(uvmm, addr, range);
+
+	if (reg->region.sparse)
+		nvif_vmm_raw_sparse(vmm, addr, range, false);
+
+	drm_gpuva_region_destroy(&uvmm->umgr, &reg->region);
+	kfree(reg);
+}
+
+int
+nouveau_uvma_region_destroy(struct nouveau_uvmm *uvmm,
+			    u64 addr, u64 range)
+{
+	struct drm_gpuva_region *reg;
+
+	reg = drm_gpuva_region_find(&uvmm->umgr,
+				    addr >> PAGE_SHIFT,
+				    range >> PAGE_SHIFT);
+	if (!reg)
+		return -ENOENT;
+
+	__nouveau_uvma_region_destroy(uvma_region_from_va_region(reg));
+
+	return 0;
+}
+
+static int
+op_map(struct nouveau_uvmm *uvmm,
+       struct drm_gpuva_op_map *m,
+       struct nouveau_uvmm_map_args *args)
+{
+	struct nouveau_uvma *uvma;
+	struct nouveau_bo *nvbo = nouveau_gem_object(m->gem.obj);
+	int ret;
+
+	ret = nouveau_uvma_new(uvmm, m->gem.obj,
+			       m->gem.offset << PAGE_SHIFT,
+			       m->va.addr << PAGE_SHIFT,
+			       m->va.range << PAGE_SHIFT,
+			       args->kind, &uvma);
+	if (ret)
+		return ret;
+
+	drm_gpuva_swap(&uvma->va, args->swapped);
+	if (!args->swapped) {
+		ret = nouveau_uvma_map(uvma, nouveau_mem(nvbo->bo.resource));
+		if (ret) {
+			nouveau_uvma_destroy(uvma);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int
+op_unmap(struct nouveau_uvmm *uvmm,
+	 struct drm_gpuva_op_unmap *u)
+{
+	struct nouveau_uvma *uvma = uvma_from_va(u->va);
+	int ret;
+
+	ret = nouveau_uvma_unmap(uvma);
+	if (ret)
+		return ret;
+
+	nouveau_uvma_destroy(uvma);
+
+	return 0;
+}
+
+static struct drm_gem_object *
+op_gem_obj(struct drm_gpuva_op *op)
+{
+	switch (op->op) {
+	case DRM_GPUVA_OP_MAP:
+		return op->map.gem.obj;
+	case DRM_GPUVA_OP_REMAP:
+		return op->remap.unmap->va->gem.obj;
+	case DRM_GPUVA_OP_UNMAP:
+		return op->unmap.va->gem.obj;
+	default:
+		WARN(1, "unknown operation");
+		return NULL;
+	}
+}
+
+static int
+process_sm_ops(struct nouveau_uvmm *uvmm, struct drm_gpuva_ops *ops,
+	       struct nouveau_uvmm_map_args *args)
+{
+	struct drm_gpuva_op *op;
+	struct drm_gem_object *obj;
+	int ret = 0;
+
+	drm_gpuva_for_each_op(op, ops) {
+		obj = op_gem_obj(op);
+		if (!obj)
+			continue;
+
+		drm_gem_gpuva_lock(obj);
+
+		switch (op->op) {
+		case DRM_GPUVA_OP_MAP:
+			ret = op_map(uvmm, &op->map, args);
+			if (ret)
+				goto err_unlock;
+
+			break;
+		case DRM_GPUVA_OP_REMAP:
+		{
+			struct drm_gpuva_op_remap *r = &op->remap;
+			struct drm_gpuva *va = r->unmap->va;
+			struct nouveau_uvmm_map_args remap_args = {
+				.kind = uvma_from_va(r->unmap->va)->kind,
+				.swapped = drm_gpuva_swapped(va),
+			};
+
+			ret = op_unmap(uvmm, r->unmap);
+			if (ret)
+				goto err_unlock;
+
+			if (r->prev) {
+				ret = op_map(uvmm, r->prev, &remap_args);
+				if (ret)
+					goto err_unlock;
+			}
+
+			if (r->next) {
+				ret = op_map(uvmm, r->next, &remap_args);
+				if (ret)
+					goto err_unlock;
+			}
+
+			break;
+		}
+		case DRM_GPUVA_OP_UNMAP:
+			ret = op_unmap(uvmm, &op->unmap);
+			if (ret)
+				goto err_unlock;
+
+			break;
+		}
+
+		drm_gem_gpuva_unlock(obj);
+	}
+
+	return 0;
+
+err_unlock:
+	drm_gem_gpuva_unlock(obj);
+	return ret;
+}
+
+int
+nouveau_uvmm_sm_map(struct nouveau_uvmm *uvmm, u64 addr, u64 range,
+		    struct drm_gem_object *obj, u64 offset, u8 kind)
+{
+	struct drm_gpuva_ops *ops;
+	struct nouveau_uvmm_map_args args = {
+		.kind = kind,
+		.swapped = false,
+	};
+	int ret;
+
+	ops = drm_gpuva_sm_map_ops_create(&uvmm->umgr,
+					  addr >> PAGE_SHIFT,
+					  range >> PAGE_SHIFT,
+					  obj, offset >> PAGE_SHIFT);
+	if (IS_ERR(ops))
+		return PTR_ERR(ops);
+
+	ret = process_sm_ops(uvmm, ops, &args);
+	drm_gpuva_ops_free(ops);
+
+	return ret;
+}
+
+int
+nouveau_uvmm_sm_unmap(struct nouveau_uvmm *uvmm, u64 addr, u64 range)
+{
+	struct drm_gpuva_ops *ops;
+	int ret;
+
+	ops = drm_gpuva_sm_unmap_ops_create(&uvmm->umgr,
+					    addr >> PAGE_SHIFT,
+					    range >> PAGE_SHIFT);
+	if (IS_ERR(ops))
+		return PTR_ERR(ops);
+
+	ret = process_sm_ops(uvmm, ops, NULL);
+	drm_gpuva_ops_free(ops);
+
+	return ret;
+}
+
+int nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
+		      struct drm_nouveau_vm_init *init)
+{
+	int ret;
+	u64 unmanaged_end = init->unmanaged_addr + init->unmanaged_size;
+
+	mutex_lock(&cli->mutex);
+
+	if (unlikely(cli->uvmm.disabled)) {
+		ret = -ENOSYS;
+		goto out_unlock;
+	}
+
+	if (unmanaged_end <= init->unmanaged_addr) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (unmanaged_end > NOUVEAU_VA_SPACE_END) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	uvmm->unmanaged_addr = init->unmanaged_addr;
+	uvmm->unmanaged_size = init->unmanaged_size;
+
+	drm_gpuva_manager_init(&uvmm->umgr, cli->name,
+			       NOUVEAU_VA_SPACE_START >> PAGE_SHIFT,
+			       NOUVEAU_VA_SPACE_END >> PAGE_SHIFT,
+			       init->unmanaged_addr >> PAGE_SHIFT,
+			       init->unmanaged_size >> PAGE_SHIFT);
+
+	ret = nvif_vmm_ctor(&cli->mmu, "uvmm",
+			    cli->vmm.vmm.object.oclass, RAW,
+			    init->unmanaged_addr, init->unmanaged_size,
+			    NULL, 0, &cli->uvmm.vmm.vmm);
+	if (ret)
+		goto out_free_gpuva_mgr;
+
+	cli->uvmm.vmm.cli = cli;
+	mutex_unlock(&cli->mutex);
+
+	mutex_init(&uvmm->mutex);
+
+	return 0;
+
+out_free_gpuva_mgr:
+	drm_gpuva_manager_destroy(&uvmm->umgr);
+out_unlock:
+	mutex_unlock(&cli->mutex);
+	return ret;
+}
+
+void nouveau_uvmm_fini(struct nouveau_uvmm *uvmm)
+{
+	struct nouveau_cli *cli = uvmm->vmm.cli;
+	struct drm_gpuva_region *reg, *next;
+
+	if (!cli)
+		return;
+
+	/* Destroying a region implies destroying all mappings within the
+	 * region.
+	 */
+	nouveau_uvmm_lock(uvmm);
+	drm_gpuva_for_each_region_safe(reg, next, &uvmm->umgr)
+		if (&reg->node != &uvmm->umgr.kernel_alloc_node)
+			__nouveau_uvma_region_destroy(uvma_region_from_va_region(reg));
+	nouveau_uvmm_unlock(uvmm);
+
+	mutex_lock(&cli->mutex);
+	nouveau_vmm_fini(&uvmm->vmm);
+	drm_gpuva_manager_destroy(&uvmm->umgr);
+	mutex_unlock(&cli->mutex);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
new file mode 100644
index 000000000000..b0ad57004aa6
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: MIT
+
+#ifndef __NOUVEAU_UVMM_H__
+#define __NOUVEAU_UVMM_H__
+
+#include <drm/drm_gpuva_mgr.h>
+
+#include "nouveau_drv.h"
+
+struct nouveau_uvmm {
+	struct nouveau_vmm vmm;
+	struct drm_gpuva_manager umgr;
+	struct mutex mutex;
+
+	u64 unmanaged_addr;
+	u64 unmanaged_size;
+
+	bool disabled;
+};
+
+struct nouveau_uvma_region {
+	struct drm_gpuva_region region;
+	struct nouveau_uvmm *uvmm;
+};
+
+struct nouveau_uvma {
+	struct drm_gpuva va;
+	struct nouveau_uvmm *uvmm;
+	u64 handle;
+	u8 kind;
+};
+
+#define uvmm_from_mgr(x) container_of((x), struct nouveau_uvmm, umgr)
+#define uvma_from_va(x) container_of((x), struct nouveau_uvma, va)
+#define uvma_region_from_va_region(x) container_of((x), struct nouveau_uvma_region, region)
+
+int nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli *cli,
+		      struct drm_nouveau_vm_init *init);
+void nouveau_uvmm_fini(struct nouveau_uvmm *uvmm);
+
+int nouveau_uvma_region_new(struct nouveau_uvmm *uvmm,
+			    u64 addr, u64 range,
+			    bool sparse);
+int nouveau_uvma_region_destroy(struct nouveau_uvmm *uvmm,
+				u64 addr, u64 range);
+
+int nouveau_uvmm_sm_map(struct nouveau_uvmm *uvmm, u64 addr, u64 range,
+			struct drm_gem_object *obj, u64 offset, u8 kind);
+int nouveau_uvmm_sm_unmap(struct nouveau_uvmm *uvmm, u64 addr, u64 range);
+
+void nouveau_uvmm_cli_unmap_all(struct nouveau_uvmm *uvmm,
+				struct drm_gem_object *obj);
+void nouveau_uvmm_bo_map_all(struct nouveau_bo *nvbov, struct nouveau_mem *mem);
+void nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo);
+
+int nouveau_uvmm_validate_range(struct nouveau_uvmm *uvmm, u64 addr, u64 range);
+
+static inline void nouveau_uvmm_lock(struct nouveau_uvmm *uvmm)
+{
+	mutex_lock(&uvmm->mutex);
+}
+
+static inline void nouveau_uvmm_unlock(struct nouveau_uvmm *uvmm)
+{
+	mutex_unlock(&uvmm->mutex);
+}
+
+#endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (11 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18 20:37   ` Thomas Hellström (Intel)
  2023-01-18  6:12 ` [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
  2023-01-18  8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
  14 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
   DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
   space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
   to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
   UMDs can request the named operations to be processed either
   synchronously or asynchronously. It supports DRM syncobjs
   (incl. timelines) as synchronization mechanism. The management of the
   GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
   execution happens asynchronously. It supports DRM syncobj (incl.
   timelines) as synchronization mechanism. DRM GEM object locking is
   handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 Documentation/gpu/driver-uapi.rst       |   3 +
 drivers/gpu/drm/nouveau/Kbuild          |   2 +
 drivers/gpu/drm/nouveau/Kconfig         |   2 +
 drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++++++++++
 drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
 drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
 11 files changed, 1295 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
index 9c7ca6e33a68..c08bcbb95fb3 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -13,4 +13,7 @@ drm/nouveau uAPI
 VM_BIND / EXEC uAPI
 -------------------
 
+.. kernel-doc:: drivers/gpu/drm/nouveau/nouveau_exec.c
+    :doc: Overview
+
 .. kernel-doc:: include/uapi/drm/nouveau_drm.h
diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
index ee281bb76463..cf6b3a80c0c8 100644
--- a/drivers/gpu/drm/nouveau/Kbuild
+++ b/drivers/gpu/drm/nouveau/Kbuild
@@ -47,6 +47,8 @@ nouveau-y += nouveau_prime.o
 nouveau-y += nouveau_sgdma.o
 nouveau-y += nouveau_ttm.o
 nouveau-y += nouveau_vmm.o
+nouveau-y += nouveau_exec.o
+nouveau-y += nouveau_sched.o
 nouveau-y += nouveau_uvmm.o
 
 # DRM - modesetting
diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index a0bb3987bf63..59e5c13be9b6 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -10,6 +10,8 @@ config DRM_NOUVEAU
 	select DRM_KMS_HELPER
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
+	select DRM_SCHED
 	select I2C
 	select I2C_ALGOBIT
 	select BACKLIGHT_CLASS_DEVICE if DRM_NOUVEAU_BACKLIGHT
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 36cc80eb0e20..694777a58bca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -35,6 +35,7 @@
 #include "nouveau_chan.h"
 #include "nouveau_abi16.h"
 #include "nouveau_vmm.h"
+#include "nouveau_sched.h"
 
 static struct nouveau_abi16 *
 nouveau_abi16(struct drm_file *file_priv)
@@ -125,6 +126,17 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
 {
 	struct nouveau_abi16_ntfy *ntfy, *temp;
 
+	/* When a client exits without waiting for it's queued up jobs to
+	 * finish it might happen that we fault the channel. This is due to
+	 * drm_file_free() calling drm_gem_release() before the postclose()
+	 * callback. Hence, we can't tear down this scheduler entity before
+	 * uvmm mappings are unmapped. Currently, we can't detect this case.
+	 *
+	 * However, this should be rare and harmless, since the channel isn't
+	 * needed anymore.
+	 */
+	nouveau_sched_entity_fini(&chan->sched_entity);
+
 	/* wait for all activity to stop before cleaning up */
 	if (chan->chan)
 		nouveau_channel_idle(chan->chan);
@@ -311,6 +323,10 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 	if (ret)
 		goto done;
 
+	ret = nouveau_sched_entity_init(&chan->sched_entity, &drm->sched);
+	if (ret)
+		goto done;
+
 	init->channel = chan->chan->chid;
 
 	if (device->info.family >= NV_DEVICE_INFO_V0_TESLA)
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 27eae85f33e6..8209eb28feaf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -26,6 +26,7 @@ struct nouveau_abi16_chan {
 	struct nouveau_bo *ntfy;
 	struct nouveau_vma *ntfy_vma;
 	struct nvkm_mm  heap;
+	struct nouveau_sched_entity sched_entity;
 };
 
 struct nouveau_abi16 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 989f30a31ba9..5d018207ff92 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -71,6 +71,7 @@
 #include "nouveau_svm.h"
 #include "nouveau_dmem.h"
 #include "nouveau_uvmm.h"
+#include "nouveau_sched.h"
 
 DECLARE_DYNDBG_CLASSMAP(drm_debug_classes, DD_CLASS_TYPE_DISJOINT_BITS, 0,
 			"DRM_UT_CORE",
@@ -192,6 +193,7 @@ nouveau_cli_fini(struct nouveau_cli *cli)
 	flush_work(&cli->work);
 	WARN_ON(!list_empty(&cli->worker));
 
+	nouveau_sched_entity_fini(&cli->sched_entity);
 	usif_client_fini(cli);
 	nouveau_uvmm_fini(&cli->uvmm);
 	nouveau_vmm_fini(&cli->svm);
@@ -299,6 +301,11 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
 	}
 
 	cli->mem = &mems[ret];
+
+	ret = nouveau_sched_entity_init(&cli->sched_entity, &drm->sched);
+	if (ret)
+		goto done;
+
 	return 0;
 done:
 	if (ret)
@@ -611,8 +618,13 @@ nouveau_drm_device_init(struct drm_device *dev)
 		pm_runtime_put(dev->dev);
 	}
 
-	return 0;
+	ret = nouveau_sched_init(&drm->sched, drm);
+	if (ret)
+		goto fail_sched_init;
 
+	return 0;
+fail_sched_init:
+	nouveau_display_fini(dev, false, false);
 fail_dispinit:
 	nouveau_display_destroy(dev);
 fail_dispctor:
@@ -637,6 +649,8 @@ nouveau_drm_device_fini(struct drm_device *dev)
 	struct nouveau_cli *cli, *temp_cli;
 	struct nouveau_drm *drm = nouveau_drm(dev);
 
+	nouveau_sched_fini(&drm->sched);
+
 	if (nouveau_pmops_runtime()) {
 		pm_runtime_get_sync(dev->dev);
 		pm_runtime_forbid(dev->dev);
@@ -1177,6 +1191,9 @@ nouveau_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_VM_INIT, nouveau_ioctl_vm_init, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_VM_BIND, nouveau_ioctl_vm_bind, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_EXEC, nouveau_ioctl_exec, DRM_RENDER_ALLOW),
 };
 
 long
@@ -1224,7 +1241,9 @@ nouveau_driver_fops = {
 static struct drm_driver
 driver_stub = {
 	.driver_features =
-		DRIVER_GEM | DRIVER_MODESET | DRIVER_RENDER
+		DRIVER_GEM | DRIVER_MODESET | DRIVER_RENDER |
+		DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE |
+		DRIVER_GEM_GPUVA
 #if defined(CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT)
 		| DRIVER_KMS_LEGACY_CONTEXT
 #endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index d634f1054d65..94de792ef3ca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -10,8 +10,8 @@
 #define DRIVER_DATE		"20120801"
 
 #define DRIVER_MAJOR		1
-#define DRIVER_MINOR		3
-#define DRIVER_PATCHLEVEL	1
+#define DRIVER_MINOR		4
+#define DRIVER_PATCHLEVEL	0
 
 /*
  * 1.1.1:
@@ -63,6 +63,7 @@ struct platform_device;
 
 #include "nouveau_fence.h"
 #include "nouveau_bios.h"
+#include "nouveau_sched.h"
 #include "nouveau_vmm.h"
 #include "nouveau_uvmm.h"
 
@@ -94,6 +95,8 @@ struct nouveau_cli {
 	struct nouveau_vmm svm;
 	struct nouveau_uvmm uvmm;
 
+	struct nouveau_sched_entity sched_entity;
+
 	const struct nvif_mclass *mem;
 
 	struct list_head head;
@@ -305,6 +308,8 @@ struct nouveau_drm {
 		struct mutex lock;
 		bool component_registered;
 	} audio;
+
+	struct drm_gpu_scheduler sched;
 };
 
 static inline struct nouveau_drm *
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c b/drivers/gpu/drm/nouveau/nouveau_exec.c
new file mode 100644
index 000000000000..512120bdb8a8
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -0,0 +1,310 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <drm/drm_exec.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_dma.h"
+#include "nouveau_exec.h"
+#include "nouveau_abi16.h"
+#include "nouveau_chan.h"
+#include "nouveau_sched.h"
+#include "nouveau_uvmm.h"
+
+
+/**
+ * DOC: Overview
+ *
+ * Nouveau's VM_BIND / EXEC UAPI consists of three ioctls: DRM_NOUVEAU_VM_INIT,
+ * DRM_NOUVEAU_VM_BIND and DRM_NOUVEAU_EXEC.
+ *
+ * In order to use the UAPI firstly a user client must initialize the VA space
+ * using the DRM_NOUVEAU_VM_INIT ioctl specifying which region of the VA space
+ * should be managed by the kernel and which by the UMD.
+ *
+ * The DRM_NOUVEAU_VM_BIND ioctl provides clients an interface to manage the
+ * userspace-managable portion of the VA space. It provides operations to
+ * allocate and free a VA space regions and operations to map and unmap memory
+ * within such a region. Bind operations crossing region boundaries are not
+ * permitted.
+ *
+ * When allocating a VA space region userspace may flag this region as sparse.
+ * If a region is flagged as sparse the kernel will take care that for the whole
+ * region sparse mappings are created. Subsequently requested actual memory
+ * backed mappings for a sparse region will take precedence over the sparse
+ * mappings. If the memory backed mappings are unmapped the kernel will make
+ * sure that sparse mappings will take their place again.
+ *
+ * When using the VM_BIND ioctl to request the kernel to map memory to a given
+ * virtual address in the GPU's VA space there is no guarantee that the actual
+ * mappings are created in the GPU's MMU. If the given memory is swapped out
+ * at the time the bind operation is executed the kernel will stash the mapping
+ * details into it's internal alloctor and create the actual MMU mappings once
+ * the memory is swapped back in. While this is transparent for userspace, it is
+ * guaranteed that all the backing memory is swapped back in and all the memory
+ * mappings, as requested by userspace previously, are actually mapped once the
+ * DRM_NOUVEAU_EXEC ioctl is called to submit an exec job.
+ *
+ * Contrary to VM_BIND map requests, unmap requests are allowed to span over VA
+ * space regions and completely untouched areas of the VA space.
+ *
+ * Generally, all rules for constellations like mapping and unmapping over
+ * boundaries of existing mappings are documented in the &drm_gpuva_manager.
+ *
+ * When a VA space region is freed, all existing mappings within this region are
+ * unmapped automatically.
+ *
+ * A VM_BIND job can be executed either synchronously or asynchronously. If
+ * exectued asynchronously, userspace may provide a list of syncobjs this job
+ * will wait for and/or a list of syncobj the kernel will trigger once the
+ * VM_BIND finished execution. If executed synchronously the ioctl will block
+ * until the bind job is finished and no syncobjs are permitted by the kernel.
+ *
+ * To execute a push buffer the UAPI provides the DRM_NOUVEAU_EXEC ioctl. EXEC
+ * jobs are always executed asynchronously, and, equal to VM_BIND jobs, provide
+ * the option to synchronize them with syncobjs.
+ *
+ * Besides that EXEC job can be scheduled for a specified channel to execute on.
+ *
+ * EXEC jobs wait for VM_BIND jobs they depend on when userspace submitts the
+ * EXEC job rather than when this EXEC job actually executes. This is due to the
+ * fact that at submission time of the EXEC job we'd otherwise not have the
+ * correct view of the VA space for this EXEC job, since VM_BIND jobs, this EXEC
+ * job depends on might still be in the queue. Without a recent (and hence
+ * for this particular job correct) view of the VA space, we'd potentially miss
+ * to lock, swap in and re-bind BOs that have been evicted previously.
+ */
+
+static int
+nouveau_exec_ucopy_syncs(struct nouveau_exec_base *base,
+			u32 inc, u64 ins,
+			u32 outc, u64 outs)
+{
+	struct drm_nouveau_sync **s;
+	int ret;
+
+	if (inc) {
+		s = &base->in_sync.s;
+
+		base->in_sync.count = inc;
+		*s = u_memcpya(ins, inc, sizeof(**s));
+		if (IS_ERR(*s)) {
+			ret = PTR_ERR(*s);
+			goto err_out;
+		}
+	}
+
+	if (outc) {
+		s = &base->out_sync.s;
+
+		base->out_sync.count = outc;
+		*s = u_memcpya(outs, outc, sizeof(**s));
+		if (IS_ERR(*s)) {
+			ret = PTR_ERR(*s);
+			goto err_free_ins;
+		}
+	}
+
+	return 0;
+
+err_free_ins:
+	u_free(base->in_sync.s);
+err_out:
+	return ret;
+}
+
+int
+nouveau_ioctl_vm_init(struct drm_device *dev,
+		      void *data,
+		      struct drm_file *file_priv)
+{
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct drm_nouveau_vm_init *init = data;
+
+	return nouveau_uvmm_init(&cli->uvmm, cli, init);
+}
+
+int nouveau_vm_bind(struct nouveau_exec_bind *bind)
+{
+	struct nouveau_bind_job *job;
+	int ret;
+
+	ret = nouveau_bind_job_init(&job, bind);
+	if (ret)
+		return ret;
+
+	ret = nouveau_job_submit(&job->base);
+	if (ret)
+		goto err_job_fini;
+
+	return 0;
+
+err_job_fini:
+	nouveau_job_fini(&job->base);
+	return ret;
+}
+
+int
+nouveau_ioctl_vm_bind(struct drm_device *dev,
+		      void *data,
+		      struct drm_file *file_priv)
+{
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct nouveau_exec_bind bind = {};
+	struct drm_nouveau_vm_bind *req = data;
+	int ret = 0;
+
+	if (unlikely(!nouveau_cli_uvmm_locked(cli)))
+		return -ENOSYS;
+
+	bind.flags = req->flags;
+
+	bind.op.count = req->op_count;
+	bind.op.s = u_memcpya(req->op_ptr, req->op_count,
+			      sizeof(*bind.op.s));
+	if (IS_ERR(bind.op.s))
+		return PTR_ERR(bind.op.s);
+
+	ret = nouveau_exec_ucopy_syncs(&bind.base,
+				       req->wait_count, req->wait_ptr,
+				       req->sig_count, req->sig_ptr);
+	if (ret)
+		goto out_free_ops;
+
+	bind.base.sched_entity = &cli->sched_entity;
+	bind.base.file_priv = file_priv;
+
+	ret = nouveau_vm_bind(&bind);
+	if (ret)
+		goto out_free_syncs;
+
+out_free_syncs:
+	u_free(bind.base.out_sync.s);
+	u_free(bind.base.in_sync.s);
+out_free_ops:
+	u_free(bind.op.s);
+	return ret;
+}
+
+static int
+nouveau_exec(struct nouveau_exec *exec)
+{
+	struct nouveau_exec_job *job;
+	int ret;
+
+	ret = nouveau_exec_job_init(&job, exec);
+	if (ret)
+		return ret;
+
+	ret = nouveau_job_submit(&job->base);
+	if (ret)
+		goto err_job_fini;
+
+	return 0;
+
+err_job_fini:
+	nouveau_job_fini(&job->base);
+	return ret;
+}
+
+int
+nouveau_ioctl_exec(struct drm_device *dev,
+		   void *data,
+		   struct drm_file *file_priv)
+{
+	struct nouveau_abi16 *abi16 = nouveau_abi16_get(file_priv);
+	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	struct nouveau_abi16_chan *chan16;
+	struct nouveau_channel *chan = NULL;
+	struct nouveau_exec exec = {};
+	struct drm_nouveau_exec *req = data;
+	int ret = 0;
+
+	if (unlikely(!abi16))
+		return -ENOMEM;
+
+	/* abi16 locks already */
+	if (unlikely(!nouveau_cli_uvmm(cli)))
+		return nouveau_abi16_put(abi16, -ENOSYS);
+
+	list_for_each_entry(chan16, &abi16->channels, head) {
+		if (chan16->chan->chid == req->channel) {
+			chan = chan16->chan;
+			break;
+		}
+	}
+
+	if (!chan)
+		return nouveau_abi16_put(abi16, -ENOENT);
+
+	if (unlikely(atomic_read(&chan->killed)))
+		return nouveau_abi16_put(abi16, -ENODEV);
+
+	if (!chan->dma.ib_max)
+		return nouveau_abi16_put(abi16, -ENOSYS);
+
+	if (unlikely(req->push_count == 0))
+		goto out;
+
+	if (unlikely(req->push_count > NOUVEAU_GEM_MAX_PUSH)) {
+		NV_PRINTK(err, cli, "pushbuf push count exceeds limit: %d max %d\n",
+			 req->push_count, NOUVEAU_GEM_MAX_PUSH);
+		return nouveau_abi16_put(abi16, -EINVAL);
+	}
+
+	exec.push.count = req->push_count;
+	exec.push.s = u_memcpya(req->push_ptr, req->push_count,
+				sizeof(*exec.push.s));
+	if (IS_ERR(exec.push.s)) {
+		ret = PTR_ERR(exec.push.s);
+		goto out;
+	}
+
+	ret = nouveau_exec_ucopy_syncs(&exec.base,
+				       req->wait_count, req->wait_ptr,
+				       req->sig_count, req->sig_ptr);
+	if (ret)
+		goto out_free_pushs;
+
+	exec.base.sched_entity = &chan16->sched_entity;
+	exec.base.chan = chan;
+	exec.base.file_priv = file_priv;
+
+	ret = nouveau_exec(&exec);
+	if (ret)
+		goto out_free_syncs;
+
+out_free_syncs:
+	u_free(exec.base.out_sync.s);
+	u_free(exec.base.in_sync.s);
+out_free_pushs:
+	u_free(exec.push.s);
+out:
+	return nouveau_abi16_put(abi16, ret);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.h b/drivers/gpu/drm/nouveau/nouveau_exec.h
new file mode 100644
index 000000000000..3774fc338f5d
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.h
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: MIT
+
+#ifndef __NOUVEAU_EXEC_H__
+#define __NOUVEAU_EXEC_H__
+
+#include <drm/drm_exec.h>
+
+#include "nouveau_drv.h"
+
+struct nouveau_exec_base {
+	struct nouveau_channel *chan;
+	struct drm_file *file_priv;
+	struct nouveau_sched_entity *sched_entity;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} in_sync;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} out_sync;
+};
+
+struct nouveau_exec_bind {
+	struct nouveau_exec_base base;
+	unsigned int flags;
+
+	struct {
+		struct drm_nouveau_vm_bind_op *s;
+		u32 count;
+	} op;
+};
+
+struct nouveau_exec {
+	struct nouveau_exec_base base;
+	struct drm_exec exec;
+
+	struct {
+		struct drm_nouveau_exec_push *s;
+		u32 count;
+	} push;
+};
+
+int nouveau_ioctl_vm_init(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+
+int nouveau_ioctl_vm_bind(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+
+int nouveau_ioctl_exec(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+
+#endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
new file mode 100644
index 000000000000..2749aa1908ad
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -0,0 +1,780 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *     Danilo Krummrich <dakr@redhat.com>
+ *
+ */
+
+#include <linux/slab.h>
+#include <drm/gpu_scheduler.h>
+#include <drm/drm_syncobj.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_gem.h"
+#include "nouveau_mem.h"
+#include "nouveau_dma.h"
+#include "nouveau_exec.h"
+#include "nouveau_abi16.h"
+#include "nouveau_chan.h"
+#include "nouveau_sched.h"
+
+/* FIXME
+ *
+ * We want to make sure that jobs currently executing can't be deferred by
+ * other jobs competing for the hardware. Otherwise we might end up with job
+ * timouts just because of too many clients submitting too many jobs. We don't
+ * want jobs to time out because of system load, but because of the job being
+ * too bulky.
+ *
+ * For now allow for up to 16 concurrent jobs in flight until we know how many
+ * rings the hardware can process in parallel.
+ */
+#define NOUVEAU_SCHED_HW_SUBMISSIONS		16
+#define NOUVEAU_SCHED_JOB_TIMEOUT_MS		10000
+
+#define list_for_each_op(_op, _ops) list_for_each_entry(_op, _ops, entry)
+#define list_for_each_op_safe(_op, _n, _ops) list_for_each_entry_safe(_op, _n, _ops, entry)
+
+enum bind_op {
+	OP_ALLOC = DRM_NOUVEAU_VM_BIND_OP_ALLOC,
+	OP_FREE = DRM_NOUVEAU_VM_BIND_OP_FREE,
+	OP_MAP = DRM_NOUVEAU_VM_BIND_OP_MAP,
+	OP_UNMAP = DRM_NOUVEAU_VM_BIND_OP_UNMAP,
+};
+
+struct bind_job_op {
+	struct list_head entry;
+
+	enum bind_op op;
+	u32 flags;
+
+	struct {
+		u64 addr;
+		u64 range;
+	} va;
+
+	struct {
+		u32 handle;
+		u64 offset;
+		struct drm_gem_object *obj;
+	} gem;
+};
+
+static int
+nouveau_base_job_init(struct nouveau_job *job,
+		      struct nouveau_exec_base *base)
+{
+	struct nouveau_sched_entity *entity = base->sched_entity;
+	int ret;
+
+	INIT_LIST_HEAD(&job->head);
+	job->file_priv = base->file_priv;
+	job->cli = nouveau_cli(base->file_priv);
+	job->chan = base->chan;
+	job->entity = entity;
+
+	job->in_sync.count = base->in_sync.count;
+	if (job->in_sync.count) {
+		if (job->sync)
+			return -EINVAL;
+
+		job->in_sync.s = kmemdup(base->in_sync.s,
+					 sizeof(*base->in_sync.s) *
+					 base->in_sync.count,
+					 GFP_KERNEL);
+		if (!job->in_sync.s)
+			return -ENOMEM;
+	}
+
+	job->out_sync.count = base->out_sync.count;
+	if (job->out_sync.count) {
+		if (job->sync) {
+			ret = -EINVAL;
+			goto err_free_in_sync;
+		}
+
+		job->out_sync.s = kmemdup(base->out_sync.s,
+					  sizeof(*base->out_sync.s) *
+					  base->out_sync.count,
+					  GFP_KERNEL);
+		if (!job->out_sync.s) {
+			ret = -ENOMEM;
+			goto err_free_in_sync;
+		}
+	}
+
+	ret = drm_sched_job_init(&job->base, &entity->base, NULL);
+	if (ret)
+		goto err_free_out_sync;
+
+	return 0;
+
+err_free_out_sync:
+	if (job->out_sync.s)
+		kfree(job->out_sync.s);
+err_free_in_sync:
+	if (job->in_sync.s)
+		kfree(job->in_sync.s);
+return ret;
+}
+
+static void
+nouveau_base_job_free(struct nouveau_job *job)
+{
+	if (job->in_sync.s)
+		kfree(job->in_sync.s);
+
+	if (job->out_sync.s)
+		kfree(job->out_sync.s);
+}
+
+static int
+bind_submit_validate_op(struct nouveau_job *job,
+			struct bind_job_op *op)
+{
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct drm_gem_object *obj = op->gem.obj;
+
+	if (op->op == OP_MAP) {
+		if (op->gem.offset & ~PAGE_MASK)
+			return -EINVAL;
+
+		if (obj->size <= op->gem.offset)
+			return -EINVAL;
+
+		if (op->va.range > (obj->size - op->gem.offset))
+			return -EINVAL;
+	}
+
+	return nouveau_uvmm_validate_range(uvmm, op->va.addr, op->va.range);
+}
+
+int
+nouveau_bind_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+	struct bind_job_op *op;
+	int ret;
+
+	list_for_each_op(op, &bind_job->ops) {
+		switch (op->op) {
+		case OP_ALLOC:
+		case OP_FREE:
+		case OP_MAP:
+		case OP_UNMAP:
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (op->op == OP_MAP) {
+			op->gem.obj = drm_gem_object_lookup(job->file_priv,
+							    op->gem.handle);
+			if (!op->gem.obj)
+				return -ENOENT;
+		}
+
+		ret = bind_submit_validate_op(job, op);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+	struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+	struct bind_job_op *op;
+	int ret = 0;
+
+	nouveau_uvmm_lock(uvmm);
+	list_for_each_op(op, &bind_job->ops) {
+		switch (op->op) {
+		case OP_ALLOC: {
+			bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+			ret = nouveau_uvma_region_new(uvmm,
+						      op->va.addr,
+						      op->va.range,
+						      sparse);
+			if (ret)
+				goto out_unlock;
+			break;
+		}
+		case OP_FREE:
+			ret = nouveau_uvma_region_destroy(uvmm,
+							  op->va.addr,
+							  op->va.range);
+			if (ret)
+				goto out_unlock;
+			break;
+		case OP_MAP:
+			ret = nouveau_uvmm_sm_map(uvmm,
+						  op->va.addr, op->va.range,
+						  op->gem.obj, op->gem.offset,
+						  op->flags && 0xff);
+			if (ret)
+				goto out_unlock;
+			break;
+		case OP_UNMAP:
+			ret = nouveau_uvmm_sm_unmap(uvmm,
+						    op->va.addr,
+						    op->va.range);
+			if (ret)
+				goto out_unlock;
+			break;
+		}
+	}
+
+out_unlock:
+	nouveau_uvmm_unlock(uvmm);
+	if (ret)
+		NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+	return ERR_PTR(ret);
+}
+
+static void
+nouveau_bind_job_free(struct nouveau_job *job)
+{
+	struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+	struct bind_job_op *op, *next;
+
+	list_for_each_op_safe(op, next, &bind_job->ops) {
+		struct drm_gem_object *obj = op->gem.obj;
+
+		if (obj)
+			drm_gem_object_put(obj);
+
+		list_del(&op->entry);
+		kfree(op);
+	}
+
+	nouveau_base_job_free(job);
+	kfree(bind_job);
+}
+
+static struct nouveau_job_ops nouveau_bind_job_ops = {
+	.submit = nouveau_bind_job_submit,
+	.run = nouveau_bind_job_run,
+	.free = nouveau_bind_job_free,
+};
+
+static int
+bind_job_op_from_uop(struct bind_job_op **pop,
+		     struct drm_nouveau_vm_bind_op *uop)
+{
+	struct bind_job_op *op;
+
+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	op->op = uop->op;
+	op->flags = uop->flags;
+	op->va.addr = uop->addr;
+	op->va.range = uop->range;
+
+	if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) {
+		op->gem.handle = uop->handle;
+		op->gem.offset = uop->bo_offset;
+	}
+
+	return 0;
+}
+
+static void
+bind_job_ops_free(struct list_head *ops)
+{
+	struct bind_job_op *op, *next;
+
+	list_for_each_op_safe(op, next, ops) {
+		list_del(&op->entry);
+		kfree(op);
+	}
+}
+
+int
+nouveau_bind_job_init(struct nouveau_bind_job **pjob,
+		      struct nouveau_exec_bind *bind)
+{
+	struct nouveau_bind_job *job;
+	struct bind_job_op *op;
+	int i, ret;
+
+	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&job->ops);
+
+	for (i = 0; i < bind->op.count; i++) {
+		ret = bind_job_op_from_uop(&op, &bind->op.s[i]);
+		if (ret)
+			goto err_free;
+
+		list_add_tail(&op->entry, &job->ops);
+	}
+
+	job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
+	job->base.ops = &nouveau_bind_job_ops;
+
+	ret = nouveau_base_job_init(&job->base, &bind->base);
+	if (ret)
+		goto err_free;
+
+	return 0;
+
+err_free:
+	bind_job_ops_free(&job->ops);
+	kfree(job);
+	*pjob = NULL;
+
+	return ret;
+}
+
+static int
+sync_find_fence(struct nouveau_job *job,
+		struct drm_nouveau_sync *sync,
+		struct dma_fence **fence)
+{
+	u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
+	u64 point = 0;
+	int ret;
+
+	if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
+	    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
+		return -EOPNOTSUPP;
+
+	if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
+		point = sync->timeline_value;
+
+	ret = drm_syncobj_find_fence(job->file_priv,
+				     sync->handle, point,
+				     sync->flags, fence);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int
+exec_job_binds_wait(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+	struct nouveau_cli *cli = exec_job->base.cli;
+	struct nouveau_sched_entity *bind_entity = &cli->sched_entity;
+	signed long ret;
+	int i;
+
+	for (i = 0; i < job->in_sync.count; i++) {
+		struct nouveau_job *it;
+		struct drm_nouveau_sync *sync = &job->in_sync.s[i];
+		struct dma_fence *fence;
+		bool found;
+
+		ret = sync_find_fence(job, sync, &fence);
+		if (ret)
+			return ret;
+
+		mutex_lock(&bind_entity->job.mutex);
+		found = false;
+		list_for_each_entry(it, &bind_entity->job.list, head) {
+			if (fence == it->done_fence) {
+				found = true;
+				break;
+			}
+		}
+		mutex_unlock(&bind_entity->job.mutex);
+
+		/* If the fence is not from a VM_BIND job, don't wait for it. */
+		if (!found)
+			continue;
+
+		ret = dma_fence_wait_timeout(fence, true,
+					     msecs_to_jiffies(500));
+		if (ret < 0)
+			return ret;
+		else if (ret == 0)
+			return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+int
+nouveau_exec_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+	struct nouveau_cli *cli = exec_job->base.cli;
+	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
+	struct drm_exec *exec = &job->exec;
+	struct drm_gem_object *obj;
+	unsigned long index;
+	int ret;
+
+	ret = exec_job_binds_wait(job);
+	if (ret)
+		return ret;
+
+	nouveau_uvmm_lock(uvmm);
+	drm_exec_while_not_all_locked(exec) {
+		struct drm_gpuva *va;
+
+		drm_gpuva_for_each_va(va, &uvmm->umgr) {
+			ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
+			drm_exec_break_on_contention(exec);
+			if (ret)
+				return ret;
+		}
+	}
+	nouveau_uvmm_unlock(uvmm);
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		struct dma_resv *resv = obj->resv;
+		struct nouveau_bo *nvbo = nouveau_gem_object(obj);
+
+		ret = nouveau_bo_validate(nvbo, true, false);
+		if (ret)
+			return ret;
+
+		dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE);
+	}
+
+	return 0;
+}
+
+static struct dma_fence *
+nouveau_exec_job_run(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+	struct nouveau_fence *fence;
+	int i, ret;
+
+	ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
+	if (ret) {
+		NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
+		return ERR_PTR(ret);
+	}
+
+	for (i = 0; i < exec_job->push.count; i++) {
+		nv50_dma_push(job->chan, exec_job->push.s[i].va,
+			      exec_job->push.s[i].va_len);
+	}
+
+	ret = nouveau_fence_new(job->chan, false, &fence);
+	if (ret) {
+		NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
+		WIND_RING(job->chan);
+		return ERR_PTR(ret);
+	}
+
+	return &fence->base;
+}
+static void
+nouveau_exec_job_free(struct nouveau_job *job)
+{
+	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
+
+	nouveau_base_job_free(job);
+
+	kfree(exec_job->push.s);
+	kfree(exec_job);
+}
+
+static struct nouveau_job_ops nouveau_exec_job_ops = {
+	.submit = nouveau_exec_job_submit,
+	.run = nouveau_exec_job_run,
+	.free = nouveau_exec_job_free,
+};
+
+int
+nouveau_exec_job_init(struct nouveau_exec_job **pjob,
+		      struct nouveau_exec *exec)
+{
+	struct nouveau_exec_job *job;
+	int ret;
+
+	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	job->push.count = exec->push.count;
+	job->push.s = kmemdup(exec->push.s,
+			      sizeof(*exec->push.s) *
+			      exec->push.count,
+			      GFP_KERNEL);
+	if (!job->push.s) {
+		ret = -ENOMEM;
+		goto err_free_job;
+	}
+
+	job->base.ops = &nouveau_exec_job_ops;
+	ret = nouveau_base_job_init(&job->base, &exec->base);
+	if (ret)
+		goto err_free_pushs;
+
+	return 0;
+
+err_free_pushs:
+	kfree(job->push.s);
+err_free_job:
+	kfree(job);
+	*pjob = NULL;
+
+	return ret;
+}
+
+void nouveau_job_fini(struct nouveau_job *job)
+{
+	dma_fence_put(job->done_fence);
+	drm_sched_job_cleanup(&job->base);
+	job->ops->free(job);
+}
+
+static int
+nouveau_job_add_deps(struct nouveau_job *job)
+{
+	struct dma_fence *in_fence = NULL;
+	int ret, i;
+
+	for (i = 0; i < job->in_sync.count; i++) {
+		struct drm_nouveau_sync *sync = &job->in_sync.s[i];
+
+		ret = sync_find_fence(job, sync, &in_fence);
+		if (ret) {
+			NV_PRINTK(warn, job->cli,
+				  "Failed to find syncobj (-> in): handle=%d\n",
+				  sync->handle);
+			return ret;
+		}
+
+		ret = drm_sched_job_add_dependency(&job->base, in_fence);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int
+nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence *fence)
+{
+	struct drm_syncobj *out_sync;
+	int i;
+
+	for (i = 0; i < job->out_sync.count; i++) {
+		struct drm_nouveau_sync *sync = &job->out_sync.s[i];
+		u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
+
+		if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
+		    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
+			return -EOPNOTSUPP;
+
+		out_sync = drm_syncobj_find(job->file_priv, sync->handle);
+		if (!out_sync) {
+			NV_PRINTK(warn, job->cli,
+				  "Failed to find syncobj (-> out): handle=%d\n",
+				  sync->handle);
+			return -ENOENT;
+		}
+
+		if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
+			struct dma_fence_chain *chain;
+
+			chain = dma_fence_chain_alloc();
+			if (!chain) {
+				drm_syncobj_put(out_sync);
+				return -ENOMEM;
+			}
+
+			drm_syncobj_add_point(out_sync, chain, fence,
+					      sync->timeline_value);
+		} else {
+			drm_syncobj_replace_fence(out_sync, fence);
+		}
+
+		drm_syncobj_put(out_sync);
+	}
+
+	return 0;
+}
+
+static struct dma_fence *
+nouveau_job_run(struct nouveau_job *job)
+{
+	return job->ops->run(job);
+}
+
+static int
+nouveau_job_run_sync(struct nouveau_job *job)
+{
+	struct dma_fence *fence;
+	int ret;
+
+	fence = nouveau_job_run(job);
+	if (IS_ERR(fence)) {
+		return PTR_ERR(fence);
+	} else if (fence) {
+		ret = dma_fence_wait(fence, true);
+		if (ret)
+			return ret;
+	}
+
+	dma_fence_signal(job->done_fence);
+
+	return 0;
+}
+
+int
+nouveau_job_submit(struct nouveau_job *job)
+{
+	struct nouveau_sched_entity *entity = to_nouveau_sched_entity(job->base.entity);
+	int ret;
+
+	drm_exec_init(&job->exec, true);
+
+	ret = nouveau_job_add_deps(job);
+	if (ret)
+		goto out;
+
+	drm_sched_job_arm(&job->base);
+	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	ret = nouveau_job_fence_attach(job, job->done_fence);
+	if (ret)
+		goto out;
+
+	if (job->ops->submit) {
+		ret = job->ops->submit(job);
+		if (ret)
+			goto out;
+	}
+
+	if (job->sync) {
+		drm_exec_fini(&job->exec);
+
+		/* We're requested to run a synchronous job, hence don't push
+		 * the job, bypassing the job scheduler, and execute the jobs
+		 * run() function right away.
+		 *
+		 * As a consequence of bypassing the job scheduler we need to
+		 * handle fencing and job cleanup ourselfes.
+		 */
+		ret = nouveau_job_run_sync(job);
+
+		/* If the job fails, the caller will do the cleanup for us. */
+		if (!ret)
+			nouveau_job_fini(job);
+
+		return ret;
+	} else {
+		mutex_lock(&entity->job.mutex);
+		drm_sched_entity_push_job(&job->base);
+		list_add_tail(&job->head, &entity->job.list);
+		mutex_unlock(&entity->job.mutex);
+	}
+
+out:
+	drm_exec_fini(&job->exec);
+	return ret;
+}
+
+static struct dma_fence *
+nouveau_sched_run_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+
+	return nouveau_job_run(job);
+}
+
+static enum drm_gpu_sched_stat
+nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+	struct nouveau_channel *chan = job->chan;
+
+	if (unlikely(!atomic_read(&chan->killed)))
+		nouveau_channel_kill(chan);
+
+	NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
+		  chan->chid);
+
+	nouveau_sched_entity_fini(job->entity);
+
+	return DRM_GPU_SCHED_STAT_ENODEV;
+}
+
+static void
+nouveau_sched_free_job(struct drm_sched_job *sched_job)
+{
+	struct nouveau_job *job = to_nouveau_job(sched_job);
+	struct nouveau_sched_entity *entity = job->entity;
+
+	mutex_lock(&entity->job.mutex);
+	list_del(&job->head);
+	mutex_unlock(&entity->job.mutex);
+
+	nouveau_job_fini(job);
+}
+
+int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
+			      struct drm_gpu_scheduler *sched)
+{
+
+	INIT_LIST_HEAD(&entity->job.list);
+	mutex_init(&entity->job.mutex);
+
+	return drm_sched_entity_init(&entity->base,
+				     DRM_SCHED_PRIORITY_NORMAL,
+				     &sched, 1, NULL);
+}
+
+void
+nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
+{
+	drm_sched_entity_destroy(&entity->base);
+}
+
+static const struct drm_sched_backend_ops nouveau_sched_ops = {
+	.run_job = nouveau_sched_run_job,
+	.timedout_job = nouveau_sched_timedout_job,
+	.free_job = nouveau_sched_free_job,
+};
+
+int nouveau_sched_init(struct drm_gpu_scheduler *sched,
+		       struct nouveau_drm *drm)
+{
+	long job_hang_limit = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
+
+	return drm_sched_init(sched, &nouveau_sched_ops,
+			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
+			      NULL, NULL, "nouveau", drm->dev->dev);
+}
+
+void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
+{
+	drm_sched_fini(sched);
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
new file mode 100644
index 000000000000..7fc5b7eea810
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: MIT
+
+#ifndef NOUVEAU_SCHED_H
+#define NOUVEAU_SCHED_H
+
+#include <linux/types.h>
+
+#include <drm/drm_exec.h>
+#include <drm/gpu_scheduler.h>
+
+#include "nouveau_drv.h"
+#include "nouveau_exec.h"
+
+#define to_nouveau_job(sched_job)		\
+		container_of((sched_job), struct nouveau_job, base)
+
+#define to_nouveau_exec_job(job)		\
+		container_of((job), struct nouveau_exec_job, base)
+
+#define to_nouveau_bind_job(job)		\
+		container_of((job), struct nouveau_bind_job, base)
+
+struct nouveau_job {
+	struct drm_sched_job base;
+	struct list_head head;
+
+	struct nouveau_sched_entity *entity;
+
+	struct drm_file *file_priv;
+	struct nouveau_cli *cli;
+	struct nouveau_channel *chan;
+
+	struct drm_exec exec;
+	struct dma_fence *done_fence;
+
+	bool sync;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} in_sync;
+
+	struct {
+		struct drm_nouveau_sync *s;
+		u32 count;
+	} out_sync;
+
+	struct nouveau_job_ops {
+		int (*submit)(struct nouveau_job *);
+		struct dma_fence *(*run)(struct nouveau_job *);
+		void (*free)(struct nouveau_job *);
+	} *ops;
+};
+
+struct nouveau_exec_job {
+	struct nouveau_job base;
+
+	struct {
+		struct drm_nouveau_exec_push *s;
+		u32 count;
+	} push;
+};
+
+struct nouveau_bind_job {
+	struct nouveau_job base;
+
+	/* struct bind_job_op */
+	struct list_head ops;
+};
+
+int nouveau_bind_job_init(struct nouveau_bind_job **job,
+			  struct nouveau_exec_bind *bind);
+int nouveau_exec_job_init(struct nouveau_exec_job **job,
+			  struct nouveau_exec *exec);
+
+int nouveau_job_submit(struct nouveau_job *job);
+void nouveau_job_fini(struct nouveau_job *job);
+
+#define to_nouveau_sched_entity(entity)		\
+		container_of((entity), struct nouveau_sched_entity, base)
+
+struct nouveau_sched_entity {
+	struct drm_sched_entity base;
+	struct {
+		struct list_head list;
+		struct mutex mutex;
+	} job;
+};
+
+int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
+			      struct drm_gpu_scheduler *sched);
+void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
+
+int nouveau_sched_init(struct drm_gpu_scheduler *sched,
+		       struct nouveau_drm *drm);
+void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
+
+#endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (12 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
@ 2023-01-18  6:12 ` Danilo Krummrich
  2023-01-18  8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
  14 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18  6:12 UTC (permalink / raw)
  To: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel, Danilo Krummrich

Provide the driver indirection iterating over all DRM GPU VA spaces to
enable the common 'gpuvas' debugfs file for dumping DRM GPU VA spaces.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_debugfs.c | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 2a36d1ca8fda..7f6ccc5d1d86 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -202,6 +202,29 @@ nouveau_debugfs_pstate_open(struct inode *inode, struct file *file)
 	return single_open(file, nouveau_debugfs_pstate_get, inode->i_private);
 }
 
+static int
+nouveau_debugfs_gpuva(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct nouveau_drm *drm = nouveau_drm(node->minor->dev);
+	struct nouveau_cli *cli;
+
+	mutex_lock(&drm->clients_lock);
+	list_for_each_entry(cli, &drm->clients, head) {
+		struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
+
+		if (!uvmm)
+			continue;
+
+		nouveau_uvmm_lock(uvmm);
+		drm_debugfs_gpuva_info(m, &uvmm->umgr);
+		nouveau_uvmm_unlock(uvmm);
+	}
+	mutex_unlock(&drm->clients_lock);
+
+	return 0;
+}
+
 static const struct file_operations nouveau_pstate_fops = {
 	.owner = THIS_MODULE,
 	.open = nouveau_debugfs_pstate_open,
@@ -213,6 +236,7 @@ static const struct file_operations nouveau_pstate_fops = {
 static struct drm_info_list nouveau_debugfs_list[] = {
 	{ "vbios.rom",  nouveau_debugfs_vbios_image, 0, NULL },
 	{ "strap_peek", nouveau_debugfs_strap_peek, 0, NULL },
+	DRM_DEBUGFS_GPUVA_INFO(nouveau_debugfs_gpuva, NULL),
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj()
  2023-01-18  6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
@ 2023-01-18  8:51   ` Christian König
  2023-01-18 19:00     ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-18  8:51 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

That one should probably be squashed into the original patch.

Christian.

Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> Don't call drm_gem_object_get() unconditionally.
>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   drivers/gpu/drm/drm_exec.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> index ed2106c22786..5713a589a6a3 100644
> --- a/drivers/gpu/drm/drm_exec.c
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -282,7 +282,6 @@ int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
>   			goto error_unlock;
>   	}
>   
> -	drm_gem_object_get(obj);
>   	return 0;
>   
>   error_unlock:


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
                   ` (13 preceding siblings ...)
  2023-01-18  6:12 ` [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
@ 2023-01-18  8:53 ` Christian König
  2023-01-18 15:34   ` Danilo Krummrich
  14 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-18  8:53 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
>
> Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> keep track of GPU virtual address (VA) mappings in a more generic way.
>
> The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> serves the following purposes in this context.
>
>      1) Provide a dedicated range allocator to track GPU VA allocations and
>         mappings, making use of the drm_mm range allocator.

This means that the ranges are allocated by the kernel? If yes that's a 
really really bad idea.

Regards,
Christian.

>
>      2) Generically connect GPU VA mappings to their backing buffers, in
>         particular DRM GEM objects.
>
>      3) Provide a common implementation to perform more complex mapping
>         operations on the GPU VA space. In particular splitting and merging
>         of GPU VA mappings, e.g. for intersecting mapping requests or partial
>         unmap requests.
>
> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> providing the following new interfaces.
>
>      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>         for UMDs to specify the portion of VA space managed by the kernel and
>         userspace, respectively.
>
>      2) Allocate and free a VA space region as well as bind and unbind memory
>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>
>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> as synchronization mechanism.
>
> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>
> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> not yet merged into drm-next it is part of this series, as well as a small fix
> for this patch, which was found while testing this series.
>
> This patch series is also available at [1].
>
> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> corresponding userspace parts for this series.
>
> The Vulkan CTS test suite passes the sparse binding and sparse residency test
> cases for the new UAPI together with Dave's Mesa work.
>
> There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> VA manager's logic through Nouveau's new UAPI and should be considered just as
> helper for implementation.
>
> However, I absolutely intend to change those test cases to proper kunit test
> cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> design.
>
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>
> I also want to give credit to Dave Airlie, who contributed a lot of ideas to
> this patch series.
>
> Christian König (1):
>    drm: execution context for GEM buffers
>
> Danilo Krummrich (13):
>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>    drm: manager to keep track of GPUs VA mappings
>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>    drm/nouveau: new VM_BIND uapi interfaces
>    drm/nouveau: get vmm via nouveau_cli_vmm()
>    drm/nouveau: bo: initialize GEM GPU VA interface
>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>    drm/nouveau: fence: fail to emit when fence context is killed
>    drm/nouveau: chan: provide nouveau_channel_kill()
>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>    drm/nouveau: implement uvmm for user mode bindings
>    drm/nouveau: implement new VM_BIND UAPI
>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>
>   Documentation/gpu/driver-uapi.rst             |   11 +
>   Documentation/gpu/drm-mm.rst                  |   43 +
>   drivers/gpu/drm/Kconfig                       |    6 +
>   drivers/gpu/drm/Makefile                      |    3 +
>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>   drivers/gpu/drm/drm_gem.c                     |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>   include/drm/drm_debugfs.h                     |   25 +
>   include/drm/drm_drv.h                         |    6 +
>   include/drm/drm_exec.h                        |  144 ++
>   include/drm/drm_gem.h                         |   75 +
>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>   include/uapi/drm/nouveau_drm.h                |  216 +++
>   47 files changed, 5266 insertions(+), 126 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>   create mode 100644 include/drm/drm_exec.h
>   create mode 100644 include/drm/drm_gpuva_mgr.h
>
>
> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
@ 2023-01-18 13:55   ` kernel test robot
  2023-01-18 15:47   ` kernel test robot
  1 sibling, 0 replies; 75+ messages in thread
From: kernel test robot @ 2023-01-18 13:55 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, christian.koenig, bskeggs,
	jason, tzimmermann, mripard, corbet
  Cc: oe-kbuild-all, nouveau, Danilo Krummrich, linux-kernel,
	dri-devel, linux-doc

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 0b45ac1170ea6416bc1d36798414c04870cd356d]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
base:   0b45ac1170ea6416bc1d36798414c04870cd356d
patch link:    https://lore.kernel.org/r/20230118061256.2689-5-dakr%40redhat.com
patch subject: [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
config: i386-randconfig-a003 (https://download.01.org/0day-ci/archive/20230118/202301182112.RFiF6tDh-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/e00f79934034ce7eb4e7fc0d722a3d28d75d44bf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
        git checkout e00f79934034ce7eb4e7fc0d722a3d28d75d44bf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/drm_debugfs.c: In function 'drm_debugfs_gpuva_info':
>> drivers/gpu/drm/drm_debugfs.c:228:28: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     228 |                            (u64)va->gem.obj, va->gem.offset);
         |                            ^


vim +228 drivers/gpu/drm/drm_debugfs.c

   178	
   179	/**
   180	 * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
   181	 * @m: pointer to the &seq_file to write
   182	 * @mgr: the &drm_gpuva_manager representing the GPU VA space
   183	 *
   184	 * Dumps the GPU VA regions and mappings of a given DRM GPU VA manager.
   185	 *
   186	 * For each DRM GPU VA space drivers should call this function from their
   187	 * &drm_info_list's show callback.
   188	 *
   189	 * Returns: 0 on success, -ENODEV if the &mgr is not initialized
   190	 */
   191	int drm_debugfs_gpuva_info(struct seq_file *m,
   192				   struct drm_gpuva_manager *mgr)
   193	{
   194		struct drm_gpuva_region *reg;
   195		struct drm_gpuva *va;
   196	
   197		if (!mgr->name)
   198			return -ENODEV;
   199	
   200		seq_printf(m, "DRM GPU VA space (%s)\n", mgr->name);
   201		seq_puts  (m, "\n");
   202		seq_puts  (m, " VA regions  | start              | range              | end                | sparse\n");
   203		seq_puts  (m, "------------------------------------------------------------------------------------\n");
   204		seq_printf(m, " VA space    | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   205			   mgr->mm_start, mgr->mm_range, mgr->mm_start + mgr->mm_range);
   206		seq_puts  (m, "-----------------------------------------------------------------------------------\n");
   207		drm_gpuva_for_each_region(reg, mgr) {
   208			struct drm_mm_node *node = &reg->node;
   209	
   210			if (node == &mgr->kernel_alloc_node) {
   211				seq_printf(m, " kernel node | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   212					   node->start, node->size, node->start + node->size);
   213				continue;
   214			}
   215	
   216			seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx | %s\n",
   217				   node->start, node->size, node->start + node->size,
   218				   reg->sparse ? "true" : "false");
   219		}
   220		seq_puts(m, "\n");
   221		seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
   222		seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
   223		drm_gpuva_for_each_va(va, mgr) {
   224			struct drm_mm_node *node = &va->node;
   225	
   226			seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
   227				   node->start, node->size, node->start + node->size,
 > 228				   (u64)va->gem.obj, va->gem.offset);
   229		}
   230	
   231		return 0;
   232	}
   233	EXPORT_SYMBOL(drm_debugfs_gpuva_info);
   234	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18  8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
@ 2023-01-18 15:34   ` Danilo Krummrich
  2023-01-18 15:37     ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18 15:34 UTC (permalink / raw)
  To: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

Hi Christian,

On 1/18/23 09:53, Christian König wrote:
> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>> This patch series provides a new UAPI for the Nouveau driver in order to
>> support Vulkan features, such as sparse bindings and sparse residency.
>>
>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>> feature to
>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>
>> The DRM GPUVA manager is indented to help drivers implement 
>> userspace-manageable
>> GPU VA spaces in reference to the Vulkan API. In order to achieve this 
>> goal it
>> serves the following purposes in this context.
>>
>>      1) Provide a dedicated range allocator to track GPU VA 
>> allocations and
>>         mappings, making use of the drm_mm range allocator.
> 
> This means that the ranges are allocated by the kernel? If yes that's a 
> really really bad idea.

No, it's just for keeping track of the ranges userspace has allocated.

- Danilo

> 
> Regards,
> Christian.
> 
>>
>>      2) Generically connect GPU VA mappings to their backing buffers, in
>>         particular DRM GEM objects.
>>
>>      3) Provide a common implementation to perform more complex mapping
>>         operations on the GPU VA space. In particular splitting and 
>> merging
>>         of GPU VA mappings, e.g. for intersecting mapping requests or 
>> partial
>>         unmap requests.
>>
>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>> itself
>> providing the following new interfaces.
>>
>>      1) Initialize a GPU VA space via the new 
>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>         for UMDs to specify the portion of VA space managed by the 
>> kernel and
>>         userspace, respectively.
>>
>>      2) Allocate and free a VA space region as well as bind and unbind 
>> memory
>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>
>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>> of the DRM
>> scheduler to queue jobs and support asynchronous processing with DRM 
>> syncobjs
>> as synchronization mechanism.
>>
>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>
>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution 
>> context
>> for GEM buffers) by Christian König. Since the patch implementing 
>> drm_exec was
>> not yet merged into drm-next it is part of this series, as well as a 
>> small fix
>> for this patch, which was found while testing this series.
>>
>> This patch series is also available at [1].
>>
>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>> corresponding userspace parts for this series.
>>
>> The Vulkan CTS test suite passes the sparse binding and sparse 
>> residency test
>> cases for the new UAPI together with Dave's Mesa work.
>>
>> There are also some test cases in the igt-gpu-tools project [3] for 
>> the new UAPI
>> and hence the DRM GPU VA manager. However, most of them are testing 
>> the DRM GPU
>> VA manager's logic through Nouveau's new UAPI and should be considered 
>> just as
>> helper for implementation.
>>
>> However, I absolutely intend to change those test cases to proper 
>> kunit test
>> cases for the DRM GPUVA manager, once and if we agree on it's 
>> usefulness and
>> design.
>>
>> [1] 
>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>> [3] 
>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>
>> I also want to give credit to Dave Airlie, who contributed a lot of 
>> ideas to
>> this patch series.
>>
>> Christian König (1):
>>    drm: execution context for GEM buffers
>>
>> Danilo Krummrich (13):
>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>    drm: manager to keep track of GPUs VA mappings
>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>    drm/nouveau: new VM_BIND uapi interfaces
>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>    drm/nouveau: fence: fail to emit when fence context is killed
>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>    drm/nouveau: implement uvmm for user mode bindings
>>    drm/nouveau: implement new VM_BIND UAPI
>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>
>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>   drivers/gpu/drm/Kconfig                       |    6 +
>>   drivers/gpu/drm/Makefile                      |    3 +
>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 +++++++++++++++++
>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>   include/drm/drm_debugfs.h                     |   25 +
>>   include/drm/drm_drv.h                         |    6 +
>>   include/drm/drm_exec.h                        |  144 ++
>>   include/drm/drm_gem.h                         |   75 +
>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>   create mode 100644 include/drm/drm_exec.h
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>>
>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 15:34   ` Danilo Krummrich
@ 2023-01-18 15:37     ` Christian König
  2023-01-18 16:19       ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-18 15:37 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> Hi Christian,
>
> On 1/18/23 09:53, Christian König wrote:
>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>> This patch series provides a new UAPI for the Nouveau driver in 
>>> order to
>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>
>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>>> feature to
>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>
>>> The DRM GPUVA manager is indented to help drivers implement 
>>> userspace-manageable
>>> GPU VA spaces in reference to the Vulkan API. In order to achieve 
>>> this goal it
>>> serves the following purposes in this context.
>>>
>>>      1) Provide a dedicated range allocator to track GPU VA 
>>> allocations and
>>>         mappings, making use of the drm_mm range allocator.
>>
>> This means that the ranges are allocated by the kernel? If yes that's 
>> a really really bad idea.
>
> No, it's just for keeping track of the ranges userspace has allocated.

Ok, that makes more sense.

So basically you have an IOCTL which asks kernel for a free range? Or 
what exactly is the drm_mm used for here?

Regards,
Christian.

>
> - Danilo
>
>>
>> Regards,
>> Christian.
>>
>>>
>>>      2) Generically connect GPU VA mappings to their backing 
>>> buffers, in
>>>         particular DRM GEM objects.
>>>
>>>      3) Provide a common implementation to perform more complex mapping
>>>         operations on the GPU VA space. In particular splitting and 
>>> merging
>>>         of GPU VA mappings, e.g. for intersecting mapping requests 
>>> or partial
>>>         unmap requests.
>>>
>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>>> itself
>>> providing the following new interfaces.
>>>
>>>      1) Initialize a GPU VA space via the new 
>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>         for UMDs to specify the portion of VA space managed by the 
>>> kernel and
>>>         userspace, respectively.
>>>
>>>      2) Allocate and free a VA space region as well as bind and 
>>> unbind memory
>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
>>> ioctl.
>>>
>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>
>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>>> of the DRM
>>> scheduler to queue jobs and support asynchronous processing with DRM 
>>> syncobjs
>>> as synchronization mechanism.
>>>
>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>
>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec 
>>> (execution context
>>> for GEM buffers) by Christian König. Since the patch implementing 
>>> drm_exec was
>>> not yet merged into drm-next it is part of this series, as well as a 
>>> small fix
>>> for this patch, which was found while testing this series.
>>>
>>> This patch series is also available at [1].
>>>
>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>> corresponding userspace parts for this series.
>>>
>>> The Vulkan CTS test suite passes the sparse binding and sparse 
>>> residency test
>>> cases for the new UAPI together with Dave's Mesa work.
>>>
>>> There are also some test cases in the igt-gpu-tools project [3] for 
>>> the new UAPI
>>> and hence the DRM GPU VA manager. However, most of them are testing 
>>> the DRM GPU
>>> VA manager's logic through Nouveau's new UAPI and should be 
>>> considered just as
>>> helper for implementation.
>>>
>>> However, I absolutely intend to change those test cases to proper 
>>> kunit test
>>> cases for the DRM GPUVA manager, once and if we agree on it's 
>>> usefulness and
>>> design.
>>>
>>> [1] 
>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next 
>>> /
>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>> [3] 
>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>
>>> I also want to give credit to Dave Airlie, who contributed a lot of 
>>> ideas to
>>> this patch series.
>>>
>>> Christian König (1):
>>>    drm: execution context for GEM buffers
>>>
>>> Danilo Krummrich (13):
>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>    drm: manager to keep track of GPUs VA mappings
>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>    drm/nouveau: new VM_BIND uapi interfaces
>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>    drm/nouveau: fence: fail to emit when fence context is killed
>>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>    drm/nouveau: implement uvmm for user mode bindings
>>>    drm/nouveau: implement new VM_BIND UAPI
>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>
>>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>>   drivers/gpu/drm/Kconfig                       |    6 +
>>>   drivers/gpu/drm/Makefile                      |    3 +
>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 
>>> +++++++++++++++++
>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>   include/drm/drm_debugfs.h                     |   25 +
>>>   include/drm/drm_drv.h                         |    6 +
>>>   include/drm/drm_exec.h                        |  144 ++
>>>   include/drm/drm_gem.h                         |   75 +
>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>   create mode 100644 include/drm/drm_exec.h
>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>
>>>
>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
  2023-01-18 13:55   ` kernel test robot
@ 2023-01-18 15:47   ` kernel test robot
  1 sibling, 0 replies; 75+ messages in thread
From: kernel test robot @ 2023-01-18 15:47 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, christian.koenig, bskeggs,
	jason, tzimmermann, mripard, corbet
  Cc: oe-kbuild-all, nouveau, Danilo Krummrich, linux-kernel,
	dri-devel, linux-doc

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 0b45ac1170ea6416bc1d36798414c04870cd356d]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
base:   0b45ac1170ea6416bc1d36798414c04870cd356d
patch link:    https://lore.kernel.org/r/20230118061256.2689-5-dakr%40redhat.com
patch subject: [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space
config: parisc-randconfig-s041-20230115 (https://download.01.org/0day-ci/archive/20230118/202301182345.0gL7pjUf-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/e00f79934034ce7eb4e7fc0d722a3d28d75d44bf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
        git checkout e00f79934034ce7eb4e7fc0d722a3d28d75d44bf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

sparse warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/drm_debugfs.c:228:33: sparse: sparse: non size-preserving pointer to integer cast

vim +228 drivers/gpu/drm/drm_debugfs.c

   178	
   179	/**
   180	 * drm_debugfs_gpuva_info - dump the given DRM GPU VA space
   181	 * @m: pointer to the &seq_file to write
   182	 * @mgr: the &drm_gpuva_manager representing the GPU VA space
   183	 *
   184	 * Dumps the GPU VA regions and mappings of a given DRM GPU VA manager.
   185	 *
   186	 * For each DRM GPU VA space drivers should call this function from their
   187	 * &drm_info_list's show callback.
   188	 *
   189	 * Returns: 0 on success, -ENODEV if the &mgr is not initialized
   190	 */
   191	int drm_debugfs_gpuva_info(struct seq_file *m,
   192				   struct drm_gpuva_manager *mgr)
   193	{
   194		struct drm_gpuva_region *reg;
   195		struct drm_gpuva *va;
   196	
   197		if (!mgr->name)
   198			return -ENODEV;
   199	
   200		seq_printf(m, "DRM GPU VA space (%s)\n", mgr->name);
   201		seq_puts  (m, "\n");
   202		seq_puts  (m, " VA regions  | start              | range              | end                | sparse\n");
   203		seq_puts  (m, "------------------------------------------------------------------------------------\n");
   204		seq_printf(m, " VA space    | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   205			   mgr->mm_start, mgr->mm_range, mgr->mm_start + mgr->mm_range);
   206		seq_puts  (m, "-----------------------------------------------------------------------------------\n");
   207		drm_gpuva_for_each_region(reg, mgr) {
   208			struct drm_mm_node *node = &reg->node;
   209	
   210			if (node == &mgr->kernel_alloc_node) {
   211				seq_printf(m, " kernel node | 0x%016llx | 0x%016llx | 0x%016llx |   -\n",
   212					   node->start, node->size, node->start + node->size);
   213				continue;
   214			}
   215	
   216			seq_printf(m, "             | 0x%016llx | 0x%016llx | 0x%016llx | %s\n",
   217				   node->start, node->size, node->start + node->size,
   218				   reg->sparse ? "true" : "false");
   219		}
   220		seq_puts(m, "\n");
   221		seq_puts(m, " VAs | start              | range              | end                | object             | object offset\n");
   222		seq_puts(m, "-------------------------------------------------------------------------------------------------------------\n");
   223		drm_gpuva_for_each_va(va, mgr) {
   224			struct drm_mm_node *node = &va->node;
   225	
   226			seq_printf(m, "     | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx | 0x%016llx\n",
   227				   node->start, node->size, node->start + node->size,
 > 228				   (u64)va->gem.obj, va->gem.offset);
   229		}
   230	
   231		return 0;
   232	}
   233	EXPORT_SYMBOL(drm_debugfs_gpuva_info);
   234	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 15:37     ` Christian König
@ 2023-01-18 16:19       ` Danilo Krummrich
  2023-01-18 16:30         ` Alex Deucher
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18 16:19 UTC (permalink / raw)
  To: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

On 1/18/23 16:37, Christian König wrote:
> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>> Hi Christian,
>>
>> On 1/18/23 09:53, Christian König wrote:
>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>> This patch series provides a new UAPI for the Nouveau driver in 
>>>> order to
>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>
>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core 
>>>> feature to
>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>
>>>> The DRM GPUVA manager is indented to help drivers implement 
>>>> userspace-manageable
>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve 
>>>> this goal it
>>>> serves the following purposes in this context.
>>>>
>>>>      1) Provide a dedicated range allocator to track GPU VA 
>>>> allocations and
>>>>         mappings, making use of the drm_mm range allocator.
>>>
>>> This means that the ranges are allocated by the kernel? If yes that's 
>>> a really really bad idea.
>>
>> No, it's just for keeping track of the ranges userspace has allocated.
> 
> Ok, that makes more sense.
> 
> So basically you have an IOCTL which asks kernel for a free range? Or 
> what exactly is the drm_mm used for here?

Not even that, userspace provides both the base address and the range,
the kernel really just keeps track of things. Though, writing a UAPI on
top of the GPUVA manager asking for a free range instead would be 
possible by just adding the corresponding wrapper functions to get a 
free hole.

Currently, and that's what I think I read out of your question, the main 
benefit of using drm_mm over simply stuffing the entries into a list or 
something boils down to easier collision detection and iterating 
sub-ranges of the whole VA space.

> 
> Regards,
> Christian.
> 
>>
>> - Danilo
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>      2) Generically connect GPU VA mappings to their backing 
>>>> buffers, in
>>>>         particular DRM GEM objects.
>>>>
>>>>      3) Provide a common implementation to perform more complex mapping
>>>>         operations on the GPU VA space. In particular splitting and 
>>>> merging
>>>>         of GPU VA mappings, e.g. for intersecting mapping requests 
>>>> or partial
>>>>         unmap requests.
>>>>
>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, 
>>>> itself
>>>> providing the following new interfaces.
>>>>
>>>>      1) Initialize a GPU VA space via the new 
>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>>         for UMDs to specify the portion of VA space managed by the 
>>>> kernel and
>>>>         userspace, respectively.
>>>>
>>>>      2) Allocate and free a VA space region as well as bind and 
>>>> unbind memory
>>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
>>>> ioctl.
>>>>
>>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>
>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use 
>>>> of the DRM
>>>> scheduler to queue jobs and support asynchronous processing with DRM 
>>>> syncobjs
>>>> as synchronization mechanism.
>>>>
>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>
>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec 
>>>> (execution context
>>>> for GEM buffers) by Christian König. Since the patch implementing 
>>>> drm_exec was
>>>> not yet merged into drm-next it is part of this series, as well as a 
>>>> small fix
>>>> for this patch, which was found while testing this series.
>>>>
>>>> This patch series is also available at [1].
>>>>
>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>>> corresponding userspace parts for this series.
>>>>
>>>> The Vulkan CTS test suite passes the sparse binding and sparse 
>>>> residency test
>>>> cases for the new UAPI together with Dave's Mesa work.
>>>>
>>>> There are also some test cases in the igt-gpu-tools project [3] for 
>>>> the new UAPI
>>>> and hence the DRM GPU VA manager. However, most of them are testing 
>>>> the DRM GPU
>>>> VA manager's logic through Nouveau's new UAPI and should be 
>>>> considered just as
>>>> helper for implementation.
>>>>
>>>> However, I absolutely intend to change those test cases to proper 
>>>> kunit test
>>>> cases for the DRM GPUVA manager, once and if we agree on it's 
>>>> usefulness and
>>>> design.
>>>>
>>>> [1] 
>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>>> [3] 
>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>>
>>>> I also want to give credit to Dave Airlie, who contributed a lot of 
>>>> ideas to
>>>> this patch series.
>>>>
>>>> Christian König (1):
>>>>    drm: execution context for GEM buffers
>>>>
>>>> Danilo Krummrich (13):
>>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>>    drm: manager to keep track of GPUs VA mappings
>>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>>    drm/nouveau: new VM_BIND uapi interfaces
>>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>>    drm/nouveau: fence: fail to emit when fence context is killed
>>>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>>    drm/nouveau: implement uvmm for user mode bindings
>>>>    drm/nouveau: implement new VM_BIND UAPI
>>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>>
>>>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>>>   Documentation/gpu/drm-mm.rst                  |   43 +
>>>>   drivers/gpu/drm/Kconfig                       |    6 +
>>>>   drivers/gpu/drm/Makefile                      |    3 +
>>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323 
>>>> +++++++++++++++++
>>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>>   include/drm/drm_debugfs.h                     |   25 +
>>>>   include/drm/drm_drv.h                         |    6 +
>>>>   include/drm/drm_exec.h                        |  144 ++
>>>>   include/drm/drm_gem.h                         |   75 +
>>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
>>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
>>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>>   create mode 100644 include/drm/drm_exec.h
>>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>
>>>>
>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 16:19       ` Danilo Krummrich
@ 2023-01-18 16:30         ` Alex Deucher
  2023-01-18 16:50           ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Alex Deucher @ 2023-01-18 16:30 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet, nouveau, linux-kernel, dri-devel,
	linux-doc

On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>
> On 1/18/23 16:37, Christian König wrote:
> > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> >> Hi Christian,
> >>
> >> On 1/18/23 09:53, Christian König wrote:
> >>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> >>>> This patch series provides a new UAPI for the Nouveau driver in
> >>>> order to
> >>>> support Vulkan features, such as sparse bindings and sparse residency.
> >>>>
> >>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> >>>> feature to
> >>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> >>>>
> >>>> The DRM GPUVA manager is indented to help drivers implement
> >>>> userspace-manageable
> >>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> >>>> this goal it
> >>>> serves the following purposes in this context.
> >>>>
> >>>>      1) Provide a dedicated range allocator to track GPU VA
> >>>> allocations and
> >>>>         mappings, making use of the drm_mm range allocator.
> >>>
> >>> This means that the ranges are allocated by the kernel? If yes that's
> >>> a really really bad idea.
> >>
> >> No, it's just for keeping track of the ranges userspace has allocated.
> >
> > Ok, that makes more sense.
> >
> > So basically you have an IOCTL which asks kernel for a free range? Or
> > what exactly is the drm_mm used for here?
>
> Not even that, userspace provides both the base address and the range,
> the kernel really just keeps track of things. Though, writing a UAPI on
> top of the GPUVA manager asking for a free range instead would be
> possible by just adding the corresponding wrapper functions to get a
> free hole.
>
> Currently, and that's what I think I read out of your question, the main
> benefit of using drm_mm over simply stuffing the entries into a list or
> something boils down to easier collision detection and iterating
> sub-ranges of the whole VA space.

Why not just do this in userspace?  We have a range manager in
libdrm_amdgpu that you could lift out into libdrm or some other
helper.

Alex


>
> >
> > Regards,
> > Christian.
> >
> >>
> >> - Danilo
> >>
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>
> >>>>      2) Generically connect GPU VA mappings to their backing
> >>>> buffers, in
> >>>>         particular DRM GEM objects.
> >>>>
> >>>>      3) Provide a common implementation to perform more complex mapping
> >>>>         operations on the GPU VA space. In particular splitting and
> >>>> merging
> >>>>         of GPU VA mappings, e.g. for intersecting mapping requests
> >>>> or partial
> >>>>         unmap requests.
> >>>>
> >>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
> >>>> itself
> >>>> providing the following new interfaces.
> >>>>
> >>>>      1) Initialize a GPU VA space via the new
> >>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> >>>>         for UMDs to specify the portion of VA space managed by the
> >>>> kernel and
> >>>>         userspace, respectively.
> >>>>
> >>>>      2) Allocate and free a VA space region as well as bind and
> >>>> unbind memory
> >>>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
> >>>> ioctl.
> >>>>
> >>>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >>>>
> >>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
> >>>> of the DRM
> >>>> scheduler to queue jobs and support asynchronous processing with DRM
> >>>> syncobjs
> >>>> as synchronization mechanism.
> >>>>
> >>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> >>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >>>>
> >>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
> >>>> (execution context
> >>>> for GEM buffers) by Christian König. Since the patch implementing
> >>>> drm_exec was
> >>>> not yet merged into drm-next it is part of this series, as well as a
> >>>> small fix
> >>>> for this patch, which was found while testing this series.
> >>>>
> >>>> This patch series is also available at [1].
> >>>>
> >>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> >>>> corresponding userspace parts for this series.
> >>>>
> >>>> The Vulkan CTS test suite passes the sparse binding and sparse
> >>>> residency test
> >>>> cases for the new UAPI together with Dave's Mesa work.
> >>>>
> >>>> There are also some test cases in the igt-gpu-tools project [3] for
> >>>> the new UAPI
> >>>> and hence the DRM GPU VA manager. However, most of them are testing
> >>>> the DRM GPU
> >>>> VA manager's logic through Nouveau's new UAPI and should be
> >>>> considered just as
> >>>> helper for implementation.
> >>>>
> >>>> However, I absolutely intend to change those test cases to proper
> >>>> kunit test
> >>>> cases for the DRM GPUVA manager, once and if we agree on it's
> >>>> usefulness and
> >>>> design.
> >>>>
> >>>> [1]
> >>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> >>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> >>>> [3]
> >>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >>>>
> >>>> I also want to give credit to Dave Airlie, who contributed a lot of
> >>>> ideas to
> >>>> this patch series.
> >>>>
> >>>> Christian König (1):
> >>>>    drm: execution context for GEM buffers
> >>>>
> >>>> Danilo Krummrich (13):
> >>>>    drm/exec: fix memory leak in drm_exec_prepare_obj()
> >>>>    drm: manager to keep track of GPUs VA mappings
> >>>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> >>>>    drm/nouveau: new VM_BIND uapi interfaces
> >>>>    drm/nouveau: get vmm via nouveau_cli_vmm()
> >>>>    drm/nouveau: bo: initialize GEM GPU VA interface
> >>>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
> >>>>    drm/nouveau: fence: fail to emit when fence context is killed
> >>>>    drm/nouveau: chan: provide nouveau_channel_kill()
> >>>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> >>>>    drm/nouveau: implement uvmm for user mode bindings
> >>>>    drm/nouveau: implement new VM_BIND UAPI
> >>>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
> >>>>
> >>>>   Documentation/gpu/driver-uapi.rst             |   11 +
> >>>>   Documentation/gpu/drm-mm.rst                  |   43 +
> >>>>   drivers/gpu/drm/Kconfig                       |    6 +
> >>>>   drivers/gpu/drm/Makefile                      |    3 +
> >>>>   drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
> >>>>   drivers/gpu/drm/drm_debugfs.c                 |   56 +
> >>>>   drivers/gpu/drm/drm_exec.c                    |  294 ++++
> >>>>   drivers/gpu/drm/drm_gem.c                     |    3 +
> >>>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
> >>>> +++++++++++++++++
> >>>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
> >>>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
> >>>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
> >>>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
> >>>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
> >>>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> >>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
> >>>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
> >>>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> >>>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
> >>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
> >>>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
> >>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
> >>>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
> >>>>   include/drm/drm_debugfs.h                     |   25 +
> >>>>   include/drm/drm_drv.h                         |    6 +
> >>>>   include/drm/drm_exec.h                        |  144 ++
> >>>>   include/drm/drm_gem.h                         |   75 +
> >>>>   include/drm/drm_gpuva_mgr.h                   |  527 +++++++
> >>>>   include/uapi/drm/nouveau_drm.h                |  216 +++
> >>>>   47 files changed, 5266 insertions(+), 126 deletions(-)
> >>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
> >>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> >>>>   create mode 100644 include/drm/drm_exec.h
> >>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
> >>>>
> >>>>
> >>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 16:30         ` Alex Deucher
@ 2023-01-18 16:50           ` Danilo Krummrich
  2023-01-18 16:54             ` Alex Deucher
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18 16:50 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet, nouveau, linux-kernel, dri-devel,
	linux-doc



On 1/18/23 17:30, Alex Deucher wrote:
> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>
>> On 1/18/23 16:37, Christian König wrote:
>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>> Hi Christian,
>>>>
>>>> On 1/18/23 09:53, Christian König wrote:
>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>> order to
>>>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>>>
>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
>>>>>> feature to
>>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>>>
>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>> userspace-manageable
>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>> this goal it
>>>>>> serves the following purposes in this context.
>>>>>>
>>>>>>       1) Provide a dedicated range allocator to track GPU VA
>>>>>> allocations and
>>>>>>          mappings, making use of the drm_mm range allocator.
>>>>>
>>>>> This means that the ranges are allocated by the kernel? If yes that's
>>>>> a really really bad idea.
>>>>
>>>> No, it's just for keeping track of the ranges userspace has allocated.
>>>
>>> Ok, that makes more sense.
>>>
>>> So basically you have an IOCTL which asks kernel for a free range? Or
>>> what exactly is the drm_mm used for here?
>>
>> Not even that, userspace provides both the base address and the range,
>> the kernel really just keeps track of things. Though, writing a UAPI on
>> top of the GPUVA manager asking for a free range instead would be
>> possible by just adding the corresponding wrapper functions to get a
>> free hole.
>>
>> Currently, and that's what I think I read out of your question, the main
>> benefit of using drm_mm over simply stuffing the entries into a list or
>> something boils down to easier collision detection and iterating
>> sub-ranges of the whole VA space.
> 
> Why not just do this in userspace?  We have a range manager in
> libdrm_amdgpu that you could lift out into libdrm or some other
> helper.

The kernel still needs to keep track of the mappings within the various 
VA spaces, e.g. it silently needs to unmap mappings that are backed by 
BOs that get evicted and remap them once they're validated (or swapped 
back in).

> 
> Alex
> 
> 
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> - Danilo
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>       2) Generically connect GPU VA mappings to their backing
>>>>>> buffers, in
>>>>>>          particular DRM GEM objects.
>>>>>>
>>>>>>       3) Provide a common implementation to perform more complex mapping
>>>>>>          operations on the GPU VA space. In particular splitting and
>>>>>> merging
>>>>>>          of GPU VA mappings, e.g. for intersecting mapping requests
>>>>>> or partial
>>>>>>          unmap requests.
>>>>>>
>>>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
>>>>>> itself
>>>>>> providing the following new interfaces.
>>>>>>
>>>>>>       1) Initialize a GPU VA space via the new
>>>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>>>>>          for UMDs to specify the portion of VA space managed by the
>>>>>> kernel and
>>>>>>          userspace, respectively.
>>>>>>
>>>>>>       2) Allocate and free a VA space region as well as bind and
>>>>>> unbind memory
>>>>>>          to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
>>>>>> ioctl.
>>>>>>
>>>>>>       3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>>>
>>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
>>>>>> of the DRM
>>>>>> scheduler to queue jobs and support asynchronous processing with DRM
>>>>>> syncobjs
>>>>>> as synchronization mechanism.
>>>>>>
>>>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>>>
>>>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
>>>>>> (execution context
>>>>>> for GEM buffers) by Christian König. Since the patch implementing
>>>>>> drm_exec was
>>>>>> not yet merged into drm-next it is part of this series, as well as a
>>>>>> small fix
>>>>>> for this patch, which was found while testing this series.
>>>>>>
>>>>>> This patch series is also available at [1].
>>>>>>
>>>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>>>>>> corresponding userspace parts for this series.
>>>>>>
>>>>>> The Vulkan CTS test suite passes the sparse binding and sparse
>>>>>> residency test
>>>>>> cases for the new UAPI together with Dave's Mesa work.
>>>>>>
>>>>>> There are also some test cases in the igt-gpu-tools project [3] for
>>>>>> the new UAPI
>>>>>> and hence the DRM GPU VA manager. However, most of them are testing
>>>>>> the DRM GPU
>>>>>> VA manager's logic through Nouveau's new UAPI and should be
>>>>>> considered just as
>>>>>> helper for implementation.
>>>>>>
>>>>>> However, I absolutely intend to change those test cases to proper
>>>>>> kunit test
>>>>>> cases for the DRM GPUVA manager, once and if we agree on it's
>>>>>> usefulness and
>>>>>> design.
>>>>>>
>>>>>> [1]
>>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>>>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>>>>>> [3]
>>>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>>>>>
>>>>>> I also want to give credit to Dave Airlie, who contributed a lot of
>>>>>> ideas to
>>>>>> this patch series.
>>>>>>
>>>>>> Christian König (1):
>>>>>>     drm: execution context for GEM buffers
>>>>>>
>>>>>> Danilo Krummrich (13):
>>>>>>     drm/exec: fix memory leak in drm_exec_prepare_obj()
>>>>>>     drm: manager to keep track of GPUs VA mappings
>>>>>>     drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>>>>>>     drm/nouveau: new VM_BIND uapi interfaces
>>>>>>     drm/nouveau: get vmm via nouveau_cli_vmm()
>>>>>>     drm/nouveau: bo: initialize GEM GPU VA interface
>>>>>>     drm/nouveau: move usercopy helpers to nouveau_drv.h
>>>>>>     drm/nouveau: fence: fail to emit when fence context is killed
>>>>>>     drm/nouveau: chan: provide nouveau_channel_kill()
>>>>>>     drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>>>>>     drm/nouveau: implement uvmm for user mode bindings
>>>>>>     drm/nouveau: implement new VM_BIND UAPI
>>>>>>     drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>>>>>
>>>>>>    Documentation/gpu/driver-uapi.rst             |   11 +
>>>>>>    Documentation/gpu/drm-mm.rst                  |   43 +
>>>>>>    drivers/gpu/drm/Kconfig                       |    6 +
>>>>>>    drivers/gpu/drm/Makefile                      |    3 +
>>>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
>>>>>>    drivers/gpu/drm/drm_debugfs.c                 |   56 +
>>>>>>    drivers/gpu/drm/drm_exec.c                    |  294 ++++
>>>>>>    drivers/gpu/drm/drm_gem.c                     |    3 +
>>>>>>    drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
>>>>>> +++++++++++++++++
>>>>>>    drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>>>>>    drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>>>>>    drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
>>>>>>    drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
>>>>>>    .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
>>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
>>>>>>    drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>>>>>    drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
>>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
>>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
>>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
>>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
>>>>>>    include/drm/drm_debugfs.h                     |   25 +
>>>>>>    include/drm/drm_drv.h                         |    6 +
>>>>>>    include/drm/drm_exec.h                        |  144 ++
>>>>>>    include/drm/drm_gem.h                         |   75 +
>>>>>>    include/drm/drm_gpuva_mgr.h                   |  527 +++++++
>>>>>>    include/uapi/drm/nouveau_drm.h                |  216 +++
>>>>>>    47 files changed, 5266 insertions(+), 126 deletions(-)
>>>>>>    create mode 100644 drivers/gpu/drm/drm_exec.c
>>>>>>    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>>>>>    create mode 100644 include/drm/drm_exec.h
>>>>>>    create mode 100644 include/drm/drm_gpuva_mgr.h
>>>>>>
>>>>>>
>>>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 16:50           ` Danilo Krummrich
@ 2023-01-18 16:54             ` Alex Deucher
  2023-01-18 19:17               ` Dave Airlie
  0 siblings, 1 reply; 75+ messages in thread
From: Alex Deucher @ 2023-01-18 16:54 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet, nouveau, linux-kernel, dri-devel,
	linux-doc

On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
>
>
>
> On 1/18/23 17:30, Alex Deucher wrote:
> > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
> >>
> >> On 1/18/23 16:37, Christian König wrote:
> >>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> >>>> Hi Christian,
> >>>>
> >>>> On 1/18/23 09:53, Christian König wrote:
> >>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> >>>>>> This patch series provides a new UAPI for the Nouveau driver in
> >>>>>> order to
> >>>>>> support Vulkan features, such as sparse bindings and sparse residency.
> >>>>>>
> >>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> >>>>>> feature to
> >>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> >>>>>>
> >>>>>> The DRM GPUVA manager is indented to help drivers implement
> >>>>>> userspace-manageable
> >>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> >>>>>> this goal it
> >>>>>> serves the following purposes in this context.
> >>>>>>
> >>>>>>       1) Provide a dedicated range allocator to track GPU VA
> >>>>>> allocations and
> >>>>>>          mappings, making use of the drm_mm range allocator.
> >>>>>
> >>>>> This means that the ranges are allocated by the kernel? If yes that's
> >>>>> a really really bad idea.
> >>>>
> >>>> No, it's just for keeping track of the ranges userspace has allocated.
> >>>
> >>> Ok, that makes more sense.
> >>>
> >>> So basically you have an IOCTL which asks kernel for a free range? Or
> >>> what exactly is the drm_mm used for here?
> >>
> >> Not even that, userspace provides both the base address and the range,
> >> the kernel really just keeps track of things. Though, writing a UAPI on
> >> top of the GPUVA manager asking for a free range instead would be
> >> possible by just adding the corresponding wrapper functions to get a
> >> free hole.
> >>
> >> Currently, and that's what I think I read out of your question, the main
> >> benefit of using drm_mm over simply stuffing the entries into a list or
> >> something boils down to easier collision detection and iterating
> >> sub-ranges of the whole VA space.
> >
> > Why not just do this in userspace?  We have a range manager in
> > libdrm_amdgpu that you could lift out into libdrm or some other
> > helper.
>
> The kernel still needs to keep track of the mappings within the various
> VA spaces, e.g. it silently needs to unmap mappings that are backed by
> BOs that get evicted and remap them once they're validated (or swapped
> back in).

Ok, you are just using this for maintaining the GPU VM space in the kernel.

Alex

>
> >
> > Alex
> >
> >
> >>
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>
> >>>> - Danilo
> >>>>
> >>>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>>
> >>>>>>       2) Generically connect GPU VA mappings to their backing
> >>>>>> buffers, in
> >>>>>>          particular DRM GEM objects.
> >>>>>>
> >>>>>>       3) Provide a common implementation to perform more complex mapping
> >>>>>>          operations on the GPU VA space. In particular splitting and
> >>>>>> merging
> >>>>>>          of GPU VA mappings, e.g. for intersecting mapping requests
> >>>>>> or partial
> >>>>>>          unmap requests.
> >>>>>>
> >>>>>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager,
> >>>>>> itself
> >>>>>> providing the following new interfaces.
> >>>>>>
> >>>>>>       1) Initialize a GPU VA space via the new
> >>>>>> DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> >>>>>>          for UMDs to specify the portion of VA space managed by the
> >>>>>> kernel and
> >>>>>>          userspace, respectively.
> >>>>>>
> >>>>>>       2) Allocate and free a VA space region as well as bind and
> >>>>>> unbind memory
> >>>>>>          to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND
> >>>>>> ioctl.
> >>>>>>
> >>>>>>       3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >>>>>>
> >>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use
> >>>>>> of the DRM
> >>>>>> scheduler to queue jobs and support asynchronous processing with DRM
> >>>>>> syncobjs
> >>>>>> as synchronization mechanism.
> >>>>>>
> >>>>>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> >>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >>>>>>
> >>>>>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec
> >>>>>> (execution context
> >>>>>> for GEM buffers) by Christian König. Since the patch implementing
> >>>>>> drm_exec was
> >>>>>> not yet merged into drm-next it is part of this series, as well as a
> >>>>>> small fix
> >>>>>> for this patch, which was found while testing this series.
> >>>>>>
> >>>>>> This patch series is also available at [1].
> >>>>>>
> >>>>>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> >>>>>> corresponding userspace parts for this series.
> >>>>>>
> >>>>>> The Vulkan CTS test suite passes the sparse binding and sparse
> >>>>>> residency test
> >>>>>> cases for the new UAPI together with Dave's Mesa work.
> >>>>>>
> >>>>>> There are also some test cases in the igt-gpu-tools project [3] for
> >>>>>> the new UAPI
> >>>>>> and hence the DRM GPU VA manager. However, most of them are testing
> >>>>>> the DRM GPU
> >>>>>> VA manager's logic through Nouveau's new UAPI and should be
> >>>>>> considered just as
> >>>>>> helper for implementation.
> >>>>>>
> >>>>>> However, I absolutely intend to change those test cases to proper
> >>>>>> kunit test
> >>>>>> cases for the DRM GPUVA manager, once and if we agree on it's
> >>>>>> usefulness and
> >>>>>> design.
> >>>>>>
> >>>>>> [1]
> >>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >>>>>> https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> >>>>>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> >>>>>> [3]
> >>>>>> https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >>>>>>
> >>>>>> I also want to give credit to Dave Airlie, who contributed a lot of
> >>>>>> ideas to
> >>>>>> this patch series.
> >>>>>>
> >>>>>> Christian König (1):
> >>>>>>     drm: execution context for GEM buffers
> >>>>>>
> >>>>>> Danilo Krummrich (13):
> >>>>>>     drm/exec: fix memory leak in drm_exec_prepare_obj()
> >>>>>>     drm: manager to keep track of GPUs VA mappings
> >>>>>>     drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> >>>>>>     drm/nouveau: new VM_BIND uapi interfaces
> >>>>>>     drm/nouveau: get vmm via nouveau_cli_vmm()
> >>>>>>     drm/nouveau: bo: initialize GEM GPU VA interface
> >>>>>>     drm/nouveau: move usercopy helpers to nouveau_drv.h
> >>>>>>     drm/nouveau: fence: fail to emit when fence context is killed
> >>>>>>     drm/nouveau: chan: provide nouveau_channel_kill()
> >>>>>>     drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> >>>>>>     drm/nouveau: implement uvmm for user mode bindings
> >>>>>>     drm/nouveau: implement new VM_BIND UAPI
> >>>>>>     drm/nouveau: debugfs: implement DRM GPU VA debugfs
> >>>>>>
> >>>>>>    Documentation/gpu/driver-uapi.rst             |   11 +
> >>>>>>    Documentation/gpu/drm-mm.rst                  |   43 +
> >>>>>>    drivers/gpu/drm/Kconfig                       |    6 +
> >>>>>>    drivers/gpu/drm/Makefile                      |    3 +
> >>>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig            |    1 +
> >>>>>>    drivers/gpu/drm/drm_debugfs.c                 |   56 +
> >>>>>>    drivers/gpu/drm/drm_exec.c                    |  294 ++++
> >>>>>>    drivers/gpu/drm/drm_gem.c                     |    3 +
> >>>>>>    drivers/gpu/drm/drm_gpuva_mgr.c               | 1323
> >>>>>> +++++++++++++++++
> >>>>>>    drivers/gpu/drm/nouveau/Kbuild                |    3 +
> >>>>>>    drivers/gpu/drm/nouveau/Kconfig               |    2 +
> >>>>>>    drivers/gpu/drm/nouveau/include/nvif/if000c.h |   23 +-
> >>>>>>    drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   17 +-
> >>>>>>    .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   10 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.c       |   23 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.c          |  152 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.c        |   16 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   24 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_drm.c         |   25 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_drv.h         |   92 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.c        |  310 ++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_exec.h        |   55 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.c       |    7 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_fence.h       |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_gem.c         |   83 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.c       |  780 ++++++++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_sched.h       |   98 ++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  575 +++++++
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_uvmm.h        |   68 +
> >>>>>>    drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> >>>>>>    drivers/gpu/drm/nouveau/nvif/vmm.c            |   73 +-
> >>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  168 ++-
> >>>>>>    .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.h    |    1 +
> >>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   32 +-
> >>>>>>    drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |    3 +
> >>>>>>    include/drm/drm_debugfs.h                     |   25 +
> >>>>>>    include/drm/drm_drv.h                         |    6 +
> >>>>>>    include/drm/drm_exec.h                        |  144 ++
> >>>>>>    include/drm/drm_gem.h                         |   75 +
> >>>>>>    include/drm/drm_gpuva_mgr.h                   |  527 +++++++
> >>>>>>    include/uapi/drm/nouveau_drm.h                |  216 +++
> >>>>>>    47 files changed, 5266 insertions(+), 126 deletions(-)
> >>>>>>    create mode 100644 drivers/gpu/drm/drm_exec.c
> >>>>>>    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >>>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> >>>>>>    create mode 100644 include/drm/drm_exec.h
> >>>>>>    create mode 100644 include/drm/drm_gpuva_mgr.h
> >>>>>>
> >>>>>>
> >>>>>> base-commit: 0b45ac1170ea6416bc1d36798414c04870cd356d
> >>>>>
> >>>>
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj()
  2023-01-18  8:51   ` Christian König
@ 2023-01-18 19:00     ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-18 19:00 UTC (permalink / raw)
  To: Christian König, daniel, airlied, bskeggs, jason,
	tzimmermann, mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

On 1/18/23 09:51, Christian König wrote:
> That one should probably be squashed into the original patch.

Yes, just wanted to make it obvious for you to pick it up in case you 
did not fix it already yourself.

> 
> Christian.
> 
> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>> Don't call drm_gem_object_get() unconditionally.
>>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   drivers/gpu/drm/drm_exec.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>> index ed2106c22786..5713a589a6a3 100644
>> --- a/drivers/gpu/drm/drm_exec.c
>> +++ b/drivers/gpu/drm/drm_exec.c
>> @@ -282,7 +282,6 @@ int drm_exec_prepare_obj(struct drm_exec *exec, 
>> struct drm_gem_object *obj,
>>               goto error_unlock;
>>       }
>> -    drm_gem_object_get(obj);
>>       return 0;
>>   error_unlock:
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 16:54             ` Alex Deucher
@ 2023-01-18 19:17               ` Dave Airlie
  2023-01-18 19:48                 ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Dave Airlie @ 2023-01-18 19:17 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Danilo Krummrich, tzimmermann, corbet, nouveau, dri-devel,
	linux-doc, linux-kernel, bskeggs, jason, airlied,
	Christian König

On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
> >
> >
> >
> > On 1/18/23 17:30, Alex Deucher wrote:
> > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
> > >>
> > >> On 1/18/23 16:37, Christian König wrote:
> > >>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > >>>> Hi Christian,
> > >>>>
> > >>>> On 1/18/23 09:53, Christian König wrote:
> > >>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > >>>>>> This patch series provides a new UAPI for the Nouveau driver in
> > >>>>>> order to
> > >>>>>> support Vulkan features, such as sparse bindings and sparse residency.
> > >>>>>>
> > >>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
> > >>>>>> feature to
> > >>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
> > >>>>>>
> > >>>>>> The DRM GPUVA manager is indented to help drivers implement
> > >>>>>> userspace-manageable
> > >>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
> > >>>>>> this goal it
> > >>>>>> serves the following purposes in this context.
> > >>>>>>
> > >>>>>>       1) Provide a dedicated range allocator to track GPU VA
> > >>>>>> allocations and
> > >>>>>>          mappings, making use of the drm_mm range allocator.
> > >>>>>
> > >>>>> This means that the ranges are allocated by the kernel? If yes that's
> > >>>>> a really really bad idea.
> > >>>>
> > >>>> No, it's just for keeping track of the ranges userspace has allocated.
> > >>>
> > >>> Ok, that makes more sense.
> > >>>
> > >>> So basically you have an IOCTL which asks kernel for a free range? Or
> > >>> what exactly is the drm_mm used for here?
> > >>
> > >> Not even that, userspace provides both the base address and the range,
> > >> the kernel really just keeps track of things. Though, writing a UAPI on
> > >> top of the GPUVA manager asking for a free range instead would be
> > >> possible by just adding the corresponding wrapper functions to get a
> > >> free hole.
> > >>
> > >> Currently, and that's what I think I read out of your question, the main
> > >> benefit of using drm_mm over simply stuffing the entries into a list or
> > >> something boils down to easier collision detection and iterating
> > >> sub-ranges of the whole VA space.
> > >
> > > Why not just do this in userspace?  We have a range manager in
> > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > helper.
> >
> > The kernel still needs to keep track of the mappings within the various
> > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > BOs that get evicted and remap them once they're validated (or swapped
> > back in).
>
> Ok, you are just using this for maintaining the GPU VM space in the kernel.
>

Yes the idea behind having common code wrapping drm_mm for this is to
allow us to make the rules consistent across drivers.

Userspace (generally Vulkan, some compute) has interfaces that pretty
much dictate a lot of how VMA tracking works, esp around lifetimes,
sparse mappings and splitting/merging underlying page tables, I'd
really like this to be more consistent across drivers, because already
I think we've seen with freedreno some divergence from amdgpu and we
also have i915/xe to deal with. I'd like to at least have one place
that we can say this is how it should work, since this is something
that *should* be consistent across drivers mostly, as it is more about
how the uapi is exposed.

Dave.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 19:17               ` Dave Airlie
@ 2023-01-18 19:48                 ` Christian König
  2023-01-19  4:04                   ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-18 19:48 UTC (permalink / raw)
  To: Dave Airlie, Alex Deucher
  Cc: Danilo Krummrich, tzimmermann, corbet, nouveau, dri-devel,
	linux-doc, linux-kernel, bskeggs, jason, airlied

Am 18.01.23 um 20:17 schrieb Dave Airlie:
> On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>>
>>>
>>> On 1/18/23 17:30, Alex Deucher wrote:
>>>> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> wrote:
>>>>> On 1/18/23 16:37, Christian König wrote:
>>>>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> On 1/18/23 09:53, Christian König wrote:
>>>>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>>>>> order to
>>>>>>>>> support Vulkan features, such as sparse bindings and sparse residency.
>>>>>>>>>
>>>>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM core
>>>>>>>>> feature to
>>>>>>>>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>>>>>>>>
>>>>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>>>>> userspace-manageable
>>>>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>>>>> this goal it
>>>>>>>>> serves the following purposes in this context.
>>>>>>>>>
>>>>>>>>>        1) Provide a dedicated range allocator to track GPU VA
>>>>>>>>> allocations and
>>>>>>>>>           mappings, making use of the drm_mm range allocator.
>>>>>>>> This means that the ranges are allocated by the kernel? If yes that's
>>>>>>>> a really really bad idea.
>>>>>>> No, it's just for keeping track of the ranges userspace has allocated.
>>>>>> Ok, that makes more sense.
>>>>>>
>>>>>> So basically you have an IOCTL which asks kernel for a free range? Or
>>>>>> what exactly is the drm_mm used for here?
>>>>> Not even that, userspace provides both the base address and the range,
>>>>> the kernel really just keeps track of things. Though, writing a UAPI on
>>>>> top of the GPUVA manager asking for a free range instead would be
>>>>> possible by just adding the corresponding wrapper functions to get a
>>>>> free hole.
>>>>>
>>>>> Currently, and that's what I think I read out of your question, the main
>>>>> benefit of using drm_mm over simply stuffing the entries into a list or
>>>>> something boils down to easier collision detection and iterating
>>>>> sub-ranges of the whole VA space.
>>>> Why not just do this in userspace?  We have a range manager in
>>>> libdrm_amdgpu that you could lift out into libdrm or some other
>>>> helper.
>>> The kernel still needs to keep track of the mappings within the various
>>> VA spaces, e.g. it silently needs to unmap mappings that are backed by
>>> BOs that get evicted and remap them once they're validated (or swapped
>>> back in).
>> Ok, you are just using this for maintaining the GPU VM space in the kernel.
>>
> Yes the idea behind having common code wrapping drm_mm for this is to
> allow us to make the rules consistent across drivers.
>
> Userspace (generally Vulkan, some compute) has interfaces that pretty
> much dictate a lot of how VMA tracking works, esp around lifetimes,
> sparse mappings and splitting/merging underlying page tables, I'd
> really like this to be more consistent across drivers, because already
> I think we've seen with freedreno some divergence from amdgpu and we
> also have i915/xe to deal with. I'd like to at least have one place
> that we can say this is how it should work, since this is something
> that *should* be consistent across drivers mostly, as it is more about
> how the uapi is exposed.

That's a really good idea, but the implementation with drm_mm won't work 
like that.

We have Vulkan applications which use the sparse feature to create 
literally millions of mappings. That's why I have fine tuned the mapping 
structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle 
possible in the handling of that.

A drm_mm_node is more in the range of ~200 bytes and certainly not 
suitable for this kind of job.

I strongly suggest to rather use a good bunch of the amdgpu VM code as 
blueprint for the common infrastructure.

Regards,
Christian.

>
> Dave.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-18  6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
@ 2023-01-18 20:37   ` Thomas Hellström (Intel)
  2023-01-19  3:44     ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Thomas Hellström (Intel) @ 2023-01-18 20:37 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, christian.koenig, bskeggs,
	jason, tzimmermann, mripard, corbet
  Cc: nouveau, linux-kernel, dri-devel, linux-doc


On 1/18/23 07:12, Danilo Krummrich wrote:
> This commit provides the implementation for the new uapi motivated by the
> Vulkan API. It allows user mode drivers (UMDs) to:
>
> 1) Initialize a GPU virtual address (VA) space via the new
>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
>     space managed by the kernel and userspace, respectively.
>
> 2) Allocate and free a VA space region as well as bind and unbind memory
>     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>     UMDs can request the named operations to be processed either
>     synchronously or asynchronously. It supports DRM syncobjs
>     (incl. timelines) as synchronization mechanism. The management of the
>     GPU VA mappings is implemented with the DRM GPU VA manager.
>
> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
>     execution happens asynchronously. It supports DRM syncobj (incl.
>     timelines) as synchronization mechanism. DRM GEM object locking is
>     handled with drm_exec.
>
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
> GPU scheduler for the asynchronous paths.
>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   Documentation/gpu/driver-uapi.rst       |   3 +
>   drivers/gpu/drm/nouveau/Kbuild          |   2 +
>   drivers/gpu/drm/nouveau/Kconfig         |   2 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
>   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
>   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
>   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++++++++++
>   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
>   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++
>   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
>   11 files changed, 1295 insertions(+), 4 deletions(-)
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
...
>
> +static struct dma_fence *
> +nouveau_bind_job_run(struct nouveau_job *job)
> +{
> +	struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> +	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> +	struct bind_job_op *op;
> +	int ret = 0;
> +

I was looking at how nouveau does the async binding compared to how xe 
does it.
It looks to me that this function being a scheduler run_job callback is 
the main part of the VM_BIND dma-fence signalling critical section for 
the job's done_fence and if so, needs to be annotated as such?

For example nouveau_uvma_region_new allocates memory, which is not 
allowed if in a dma_fence signalling critical section and the locking 
also looks suspicious?

Thanks,

Thomas


> +	nouveau_uvmm_lock(uvmm);
> +	list_for_each_op(op, &bind_job->ops) {
> +		switch (op->op) {
> +		case OP_ALLOC: {
> +			bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
> +
> +			ret = nouveau_uvma_region_new(uvmm,
> +						      op->va.addr,
> +						      op->va.range,
> +						      sparse);
> +			if (ret)
> +				goto out_unlock;
> +			break;
> +		}
> +		case OP_FREE:
> +			ret = nouveau_uvma_region_destroy(uvmm,
> +							  op->va.addr,
> +							  op->va.range);
> +			if (ret)
> +				goto out_unlock;
> +			break;
> +		case OP_MAP:
> +			ret = nouveau_uvmm_sm_map(uvmm,
> +						  op->va.addr, op->va.range,
> +						  op->gem.obj, op->gem.offset,
> +						  op->flags && 0xff);
> +			if (ret)
> +				goto out_unlock;
> +			break;
> +		case OP_UNMAP:
> +			ret = nouveau_uvmm_sm_unmap(uvmm,
> +						    op->va.addr,
> +						    op->va.range);
> +			if (ret)
> +				goto out_unlock;
> +			break;
> +		}
> +	}
> +
> +out_unlock:
> +	nouveau_uvmm_unlock(uvmm);
> +	if (ret)
> +		NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
> +	return ERR_PTR(ret);
> +}
> +
> +static void
> +nouveau_bind_job_free(struct nouveau_job *job)
> +{
> +	struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> +	struct bind_job_op *op, *next;
> +
> +	list_for_each_op_safe(op, next, &bind_job->ops) {
> +		struct drm_gem_object *obj = op->gem.obj;
> +
> +		if (obj)
> +			drm_gem_object_put(obj);
> +
> +		list_del(&op->entry);
> +		kfree(op);
> +	}
> +
> +	nouveau_base_job_free(job);
> +	kfree(bind_job);
> +}
> +
> +static struct nouveau_job_ops nouveau_bind_job_ops = {
> +	.submit = nouveau_bind_job_submit,
> +	.run = nouveau_bind_job_run,
> +	.free = nouveau_bind_job_free,
> +};
> +
> +static int
> +bind_job_op_from_uop(struct bind_job_op **pop,
> +		     struct drm_nouveau_vm_bind_op *uop)
> +{
> +	struct bind_job_op *op;
> +
> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	op->op = uop->op;
> +	op->flags = uop->flags;
> +	op->va.addr = uop->addr;
> +	op->va.range = uop->range;
> +
> +	if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) {
> +		op->gem.handle = uop->handle;
> +		op->gem.offset = uop->bo_offset;
> +	}
> +
> +	return 0;
> +}
> +
> +static void
> +bind_job_ops_free(struct list_head *ops)
> +{
> +	struct bind_job_op *op, *next;
> +
> +	list_for_each_op_safe(op, next, ops) {
> +		list_del(&op->entry);
> +		kfree(op);
> +	}
> +}
> +
> +int
> +nouveau_bind_job_init(struct nouveau_bind_job **pjob,
> +		      struct nouveau_exec_bind *bind)
> +{
> +	struct nouveau_bind_job *job;
> +	struct bind_job_op *op;
> +	int i, ret;
> +
> +	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
> +	if (!job)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&job->ops);
> +
> +	for (i = 0; i < bind->op.count; i++) {
> +		ret = bind_job_op_from_uop(&op, &bind->op.s[i]);
> +		if (ret)
> +			goto err_free;
> +
> +		list_add_tail(&op->entry, &job->ops);
> +	}
> +
> +	job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
> +	job->base.ops = &nouveau_bind_job_ops;
> +
> +	ret = nouveau_base_job_init(&job->base, &bind->base);
> +	if (ret)
> +		goto err_free;
> +
> +	return 0;
> +
> +err_free:
> +	bind_job_ops_free(&job->ops);
> +	kfree(job);
> +	*pjob = NULL;
> +
> +	return ret;
> +}
> +
> +static int
> +sync_find_fence(struct nouveau_job *job,
> +		struct drm_nouveau_sync *sync,
> +		struct dma_fence **fence)
> +{
> +	u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
> +	u64 point = 0;
> +	int ret;
> +
> +	if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
> +	    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> +		return -EOPNOTSUPP;
> +
> +	if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> +		point = sync->timeline_value;
> +
> +	ret = drm_syncobj_find_fence(job->file_priv,
> +				     sync->handle, point,
> +				     sync->flags, fence);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static int
> +exec_job_binds_wait(struct nouveau_job *job)
> +{
> +	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> +	struct nouveau_cli *cli = exec_job->base.cli;
> +	struct nouveau_sched_entity *bind_entity = &cli->sched_entity;
> +	signed long ret;
> +	int i;
> +
> +	for (i = 0; i < job->in_sync.count; i++) {
> +		struct nouveau_job *it;
> +		struct drm_nouveau_sync *sync = &job->in_sync.s[i];
> +		struct dma_fence *fence;
> +		bool found;
> +
> +		ret = sync_find_fence(job, sync, &fence);
> +		if (ret)
> +			return ret;
> +
> +		mutex_lock(&bind_entity->job.mutex);
> +		found = false;
> +		list_for_each_entry(it, &bind_entity->job.list, head) {
> +			if (fence == it->done_fence) {
> +				found = true;
> +				break;
> +			}
> +		}
> +		mutex_unlock(&bind_entity->job.mutex);
> +
> +		/* If the fence is not from a VM_BIND job, don't wait for it. */
> +		if (!found)
> +			continue;
> +
> +		ret = dma_fence_wait_timeout(fence, true,
> +					     msecs_to_jiffies(500));
> +		if (ret < 0)
> +			return ret;
> +		else if (ret == 0)
> +			return -ETIMEDOUT;
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +nouveau_exec_job_submit(struct nouveau_job *job)
> +{
> +	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> +	struct nouveau_cli *cli = exec_job->base.cli;
> +	struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
> +	struct drm_exec *exec = &job->exec;
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +	int ret;
> +
> +	ret = exec_job_binds_wait(job);
> +	if (ret)
> +		return ret;
> +
> +	nouveau_uvmm_lock(uvmm);
> +	drm_exec_while_not_all_locked(exec) {
> +		struct drm_gpuva *va;
> +
> +		drm_gpuva_for_each_va(va, &uvmm->umgr) {
> +			ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
> +			drm_exec_break_on_contention(exec);
> +			if (ret)
> +				return ret;
> +		}
> +	}
> +	nouveau_uvmm_unlock(uvmm);
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		struct dma_resv *resv = obj->resv;
> +		struct nouveau_bo *nvbo = nouveau_gem_object(obj);
> +
> +		ret = nouveau_bo_validate(nvbo, true, false);
> +		if (ret)
> +			return ret;
> +
> +		dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct dma_fence *
> +nouveau_exec_job_run(struct nouveau_job *job)
> +{
> +	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> +	struct nouveau_fence *fence;
> +	int i, ret;
> +
> +	ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
> +	if (ret) {
> +		NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
> +		return ERR_PTR(ret);
> +	}
> +
> +	for (i = 0; i < exec_job->push.count; i++) {
> +		nv50_dma_push(job->chan, exec_job->push.s[i].va,
> +			      exec_job->push.s[i].va_len);
> +	}
> +
> +	ret = nouveau_fence_new(job->chan, false, &fence);
> +	if (ret) {
> +		NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
> +		WIND_RING(job->chan);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return &fence->base;
> +}
> +static void
> +nouveau_exec_job_free(struct nouveau_job *job)
> +{
> +	struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> +
> +	nouveau_base_job_free(job);
> +
> +	kfree(exec_job->push.s);
> +	kfree(exec_job);
> +}
> +
> +static struct nouveau_job_ops nouveau_exec_job_ops = {
> +	.submit = nouveau_exec_job_submit,
> +	.run = nouveau_exec_job_run,
> +	.free = nouveau_exec_job_free,
> +};
> +
> +int
> +nouveau_exec_job_init(struct nouveau_exec_job **pjob,
> +		      struct nouveau_exec *exec)
> +{
> +	struct nouveau_exec_job *job;
> +	int ret;
> +
> +	job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
> +	if (!job)
> +		return -ENOMEM;
> +
> +	job->push.count = exec->push.count;
> +	job->push.s = kmemdup(exec->push.s,
> +			      sizeof(*exec->push.s) *
> +			      exec->push.count,
> +			      GFP_KERNEL);
> +	if (!job->push.s) {
> +		ret = -ENOMEM;
> +		goto err_free_job;
> +	}
> +
> +	job->base.ops = &nouveau_exec_job_ops;
> +	ret = nouveau_base_job_init(&job->base, &exec->base);
> +	if (ret)
> +		goto err_free_pushs;
> +
> +	return 0;
> +
> +err_free_pushs:
> +	kfree(job->push.s);
> +err_free_job:
> +	kfree(job);
> +	*pjob = NULL;
> +
> +	return ret;
> +}
> +
> +void nouveau_job_fini(struct nouveau_job *job)
> +{
> +	dma_fence_put(job->done_fence);
> +	drm_sched_job_cleanup(&job->base);
> +	job->ops->free(job);
> +}
> +
> +static int
> +nouveau_job_add_deps(struct nouveau_job *job)
> +{
> +	struct dma_fence *in_fence = NULL;
> +	int ret, i;
> +
> +	for (i = 0; i < job->in_sync.count; i++) {
> +		struct drm_nouveau_sync *sync = &job->in_sync.s[i];
> +
> +		ret = sync_find_fence(job, sync, &in_fence);
> +		if (ret) {
> +			NV_PRINTK(warn, job->cli,
> +				  "Failed to find syncobj (-> in): handle=%d\n",
> +				  sync->handle);
> +			return ret;
> +		}
> +
> +		ret = drm_sched_job_add_dependency(&job->base, in_fence);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence *fence)
> +{
> +	struct drm_syncobj *out_sync;
> +	int i;
> +
> +	for (i = 0; i < job->out_sync.count; i++) {
> +		struct drm_nouveau_sync *sync = &job->out_sync.s[i];
> +		u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
> +
> +		if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
> +		    stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> +			return -EOPNOTSUPP;
> +
> +		out_sync = drm_syncobj_find(job->file_priv, sync->handle);
> +		if (!out_sync) {
> +			NV_PRINTK(warn, job->cli,
> +				  "Failed to find syncobj (-> out): handle=%d\n",
> +				  sync->handle);
> +			return -ENOENT;
> +		}
> +
> +		if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
> +			struct dma_fence_chain *chain;
> +
> +			chain = dma_fence_chain_alloc();
> +			if (!chain) {
> +				drm_syncobj_put(out_sync);
> +				return -ENOMEM;
> +			}
> +
> +			drm_syncobj_add_point(out_sync, chain, fence,
> +					      sync->timeline_value);
> +		} else {
> +			drm_syncobj_replace_fence(out_sync, fence);
> +		}
> +
> +		drm_syncobj_put(out_sync);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct dma_fence *
> +nouveau_job_run(struct nouveau_job *job)
> +{
> +	return job->ops->run(job);
> +}
> +
> +static int
> +nouveau_job_run_sync(struct nouveau_job *job)
> +{
> +	struct dma_fence *fence;
> +	int ret;
> +
> +	fence = nouveau_job_run(job);
> +	if (IS_ERR(fence)) {
> +		return PTR_ERR(fence);
> +	} else if (fence) {
> +		ret = dma_fence_wait(fence, true);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	dma_fence_signal(job->done_fence);
> +
> +	return 0;
> +}
> +
> +int
> +nouveau_job_submit(struct nouveau_job *job)
> +{
> +	struct nouveau_sched_entity *entity = to_nouveau_sched_entity(job->base.entity);
> +	int ret;
> +
> +	drm_exec_init(&job->exec, true);
> +
> +	ret = nouveau_job_add_deps(job);
> +	if (ret)
> +		goto out;
> +
> +	drm_sched_job_arm(&job->base);
> +	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +	ret = nouveau_job_fence_attach(job, job->done_fence);
> +	if (ret)
> +		goto out;
> +
> +	if (job->ops->submit) {
> +		ret = job->ops->submit(job);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	if (job->sync) {
> +		drm_exec_fini(&job->exec);
> +
> +		/* We're requested to run a synchronous job, hence don't push
> +		 * the job, bypassing the job scheduler, and execute the jobs
> +		 * run() function right away.
> +		 *
> +		 * As a consequence of bypassing the job scheduler we need to
> +		 * handle fencing and job cleanup ourselfes.
> +		 */
> +		ret = nouveau_job_run_sync(job);
> +
> +		/* If the job fails, the caller will do the cleanup for us. */
> +		if (!ret)
> +			nouveau_job_fini(job);
> +
> +		return ret;
> +	} else {
> +		mutex_lock(&entity->job.mutex);
> +		drm_sched_entity_push_job(&job->base);
> +		list_add_tail(&job->head, &entity->job.list);
> +		mutex_unlock(&entity->job.mutex);
> +	}
> +
> +out:
> +	drm_exec_fini(&job->exec);
> +	return ret;
> +}
> +
> +static struct dma_fence *
> +nouveau_sched_run_job(struct drm_sched_job *sched_job)
> +{
> +	struct nouveau_job *job = to_nouveau_job(sched_job);
> +
> +	return nouveau_job_run(job);
> +}
> +
> +static enum drm_gpu_sched_stat
> +nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
> +{
> +	struct nouveau_job *job = to_nouveau_job(sched_job);
> +	struct nouveau_channel *chan = job->chan;
> +
> +	if (unlikely(!atomic_read(&chan->killed)))
> +		nouveau_channel_kill(chan);
> +
> +	NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
> +		  chan->chid);
> +
> +	nouveau_sched_entity_fini(job->entity);
> +
> +	return DRM_GPU_SCHED_STAT_ENODEV;
> +}
> +
> +static void
> +nouveau_sched_free_job(struct drm_sched_job *sched_job)
> +{
> +	struct nouveau_job *job = to_nouveau_job(sched_job);
> +	struct nouveau_sched_entity *entity = job->entity;
> +
> +	mutex_lock(&entity->job.mutex);
> +	list_del(&job->head);
> +	mutex_unlock(&entity->job.mutex);
> +
> +	nouveau_job_fini(job);
> +}
> +
> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
> +			      struct drm_gpu_scheduler *sched)
> +{
> +
> +	INIT_LIST_HEAD(&entity->job.list);
> +	mutex_init(&entity->job.mutex);
> +
> +	return drm_sched_entity_init(&entity->base,
> +				     DRM_SCHED_PRIORITY_NORMAL,
> +				     &sched, 1, NULL);
> +}
> +
> +void
> +nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
> +{
> +	drm_sched_entity_destroy(&entity->base);
> +}
> +
> +static const struct drm_sched_backend_ops nouveau_sched_ops = {
> +	.run_job = nouveau_sched_run_job,
> +	.timedout_job = nouveau_sched_timedout_job,
> +	.free_job = nouveau_sched_free_job,
> +};
> +
> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
> +		       struct nouveau_drm *drm)
> +{
> +	long job_hang_limit = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
> +
> +	return drm_sched_init(sched, &nouveau_sched_ops,
> +			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> +			      NULL, NULL, "nouveau", drm->dev->dev);
> +}
> +
> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
> +{
> +	drm_sched_fini(sched);
> +}
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
> new file mode 100644
> index 000000000000..7fc5b7eea810
> --- /dev/null
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: MIT
> +
> +#ifndef NOUVEAU_SCHED_H
> +#define NOUVEAU_SCHED_H
> +
> +#include <linux/types.h>
> +
> +#include <drm/drm_exec.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include "nouveau_drv.h"
> +#include "nouveau_exec.h"
> +
> +#define to_nouveau_job(sched_job)		\
> +		container_of((sched_job), struct nouveau_job, base)
> +
> +#define to_nouveau_exec_job(job)		\
> +		container_of((job), struct nouveau_exec_job, base)
> +
> +#define to_nouveau_bind_job(job)		\
> +		container_of((job), struct nouveau_bind_job, base)
> +
> +struct nouveau_job {
> +	struct drm_sched_job base;
> +	struct list_head head;
> +
> +	struct nouveau_sched_entity *entity;
> +
> +	struct drm_file *file_priv;
> +	struct nouveau_cli *cli;
> +	struct nouveau_channel *chan;
> +
> +	struct drm_exec exec;
> +	struct dma_fence *done_fence;
> +
> +	bool sync;
> +
> +	struct {
> +		struct drm_nouveau_sync *s;
> +		u32 count;
> +	} in_sync;
> +
> +	struct {
> +		struct drm_nouveau_sync *s;
> +		u32 count;
> +	} out_sync;
> +
> +	struct nouveau_job_ops {
> +		int (*submit)(struct nouveau_job *);
> +		struct dma_fence *(*run)(struct nouveau_job *);
> +		void (*free)(struct nouveau_job *);
> +	} *ops;
> +};
> +
> +struct nouveau_exec_job {
> +	struct nouveau_job base;
> +
> +	struct {
> +		struct drm_nouveau_exec_push *s;
> +		u32 count;
> +	} push;
> +};
> +
> +struct nouveau_bind_job {
> +	struct nouveau_job base;
> +
> +	/* struct bind_job_op */
> +	struct list_head ops;
> +};
> +
> +int nouveau_bind_job_init(struct nouveau_bind_job **job,
> +			  struct nouveau_exec_bind *bind);
> +int nouveau_exec_job_init(struct nouveau_exec_job **job,
> +			  struct nouveau_exec *exec);
> +
> +int nouveau_job_submit(struct nouveau_job *job);
> +void nouveau_job_fini(struct nouveau_job *job);
> +
> +#define to_nouveau_sched_entity(entity)		\
> +		container_of((entity), struct nouveau_sched_entity, base)
> +
> +struct nouveau_sched_entity {
> +	struct drm_sched_entity base;
> +	struct {
> +		struct list_head list;
> +		struct mutex mutex;
> +	} job;
> +};
> +
> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
> +			      struct drm_gpu_scheduler *sched);
> +void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
> +
> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
> +		       struct nouveau_drm *drm);
> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
> +
> +#endif

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-18 20:37   ` Thomas Hellström (Intel)
@ 2023-01-19  3:44     ` Danilo Krummrich
  2023-01-19  4:58       ` Matthew Brost
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-19  3:44 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet
  Cc: nouveau, linux-kernel, dri-devel, linux-doc

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> 
> On 1/18/23 07:12, Danilo Krummrich wrote:
>> This commit provides the implementation for the new uapi motivated by the
>> Vulkan API. It allows user mode drivers (UMDs) to:
>>
>> 1) Initialize a GPU virtual address (VA) space via the new
>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
>>     space managed by the kernel and userspace, respectively.
>>
>> 2) Allocate and free a VA space region as well as bind and unbind memory
>>     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>     UMDs can request the named operations to be processed either
>>     synchronously or asynchronously. It supports DRM syncobjs
>>     (incl. timelines) as synchronization mechanism. The management of the
>>     GPU VA mappings is implemented with the DRM GPU VA manager.
>>
>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
>>     execution happens asynchronously. It supports DRM syncobj (incl.
>>     timelines) as synchronization mechanism. DRM GEM object locking is
>>     handled with drm_exec.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
>> GPU scheduler for the asynchronous paths.
>>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/driver-uapi.rst       |   3 +
>>   drivers/gpu/drm/nouveau/Kbuild          |   2 +
>>   drivers/gpu/drm/nouveau/Kconfig         |   2 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
>>   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
>>   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
>>   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
>>   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
>>   11 files changed, 1295 insertions(+), 4 deletions(-)
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> ...
>>
>> +static struct dma_fence *
>> +nouveau_bind_job_run(struct nouveau_job *job)
>> +{
>> +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
>> +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
>> +    struct bind_job_op *op;
>> +    int ret = 0;
>> +
> 
> I was looking at how nouveau does the async binding compared to how xe 
> does it.
> It looks to me that this function being a scheduler run_job callback is 
> the main part of the VM_BIND dma-fence signalling critical section for 
> the job's done_fence and if so, needs to be annotated as such?

Yes, that's the case.

> 
> For example nouveau_uvma_region_new allocates memory, which is not 
> allowed if in a dma_fence signalling critical section and the locking 
> also looks suspicious?

Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables 
within the job's submit() function.

For the ops structures the drm_gpuva_manager allocates for reporting the 
split/merge steps back to the driver I have ideas to entirely avoid 
allocations, which also is a good thing in respect of Christians 
feedback regarding the huge amount of mapping requests some applications 
seem to generate.

Regarding the locking, anything specific that makes it look suspicious 
to you?

> 
> Thanks,
> 
> Thomas
> 
> 
>> +    nouveau_uvmm_lock(uvmm);
>> +    list_for_each_op(op, &bind_job->ops) {
>> +        switch (op->op) {
>> +        case OP_ALLOC: {
>> +            bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
>> +
>> +            ret = nouveau_uvma_region_new(uvmm,
>> +                              op->va.addr,
>> +                              op->va.range,
>> +                              sparse);
>> +            if (ret)
>> +                goto out_unlock;
>> +            break;
>> +        }
>> +        case OP_FREE:
>> +            ret = nouveau_uvma_region_destroy(uvmm,
>> +                              op->va.addr,
>> +                              op->va.range);
>> +            if (ret)
>> +                goto out_unlock;
>> +            break;
>> +        case OP_MAP:
>> +            ret = nouveau_uvmm_sm_map(uvmm,
>> +                          op->va.addr, op->va.range,
>> +                          op->gem.obj, op->gem.offset,
>> +                          op->flags && 0xff);
>> +            if (ret)
>> +                goto out_unlock;
>> +            break;
>> +        case OP_UNMAP:
>> +            ret = nouveau_uvmm_sm_unmap(uvmm,
>> +                            op->va.addr,
>> +                            op->va.range);
>> +            if (ret)
>> +                goto out_unlock;
>> +            break;
>> +        }
>> +    }
>> +
>> +out_unlock:
>> +    nouveau_uvmm_unlock(uvmm);
>> +    if (ret)
>> +        NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
>> +    return ERR_PTR(ret);
>> +}
>> +
>> +static void
>> +nouveau_bind_job_free(struct nouveau_job *job)
>> +{
>> +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
>> +    struct bind_job_op *op, *next;
>> +
>> +    list_for_each_op_safe(op, next, &bind_job->ops) {
>> +        struct drm_gem_object *obj = op->gem.obj;
>> +
>> +        if (obj)
>> +            drm_gem_object_put(obj);
>> +
>> +        list_del(&op->entry);
>> +        kfree(op);
>> +    }
>> +
>> +    nouveau_base_job_free(job);
>> +    kfree(bind_job);
>> +}
>> +
>> +static struct nouveau_job_ops nouveau_bind_job_ops = {
>> +    .submit = nouveau_bind_job_submit,
>> +    .run = nouveau_bind_job_run,
>> +    .free = nouveau_bind_job_free,
>> +};
>> +
>> +static int
>> +bind_job_op_from_uop(struct bind_job_op **pop,
>> +             struct drm_nouveau_vm_bind_op *uop)
>> +{
>> +    struct bind_job_op *op;
>> +
>> +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +    if (!op)
>> +        return -ENOMEM;
>> +
>> +    op->op = uop->op;
>> +    op->flags = uop->flags;
>> +    op->va.addr = uop->addr;
>> +    op->va.range = uop->range;
>> +
>> +    if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) {
>> +        op->gem.handle = uop->handle;
>> +        op->gem.offset = uop->bo_offset;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>> +bind_job_ops_free(struct list_head *ops)
>> +{
>> +    struct bind_job_op *op, *next;
>> +
>> +    list_for_each_op_safe(op, next, ops) {
>> +        list_del(&op->entry);
>> +        kfree(op);
>> +    }
>> +}
>> +
>> +int
>> +nouveau_bind_job_init(struct nouveau_bind_job **pjob,
>> +              struct nouveau_exec_bind *bind)
>> +{
>> +    struct nouveau_bind_job *job;
>> +    struct bind_job_op *op;
>> +    int i, ret;
>> +
>> +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
>> +    if (!job)
>> +        return -ENOMEM;
>> +
>> +    INIT_LIST_HEAD(&job->ops);
>> +
>> +    for (i = 0; i < bind->op.count; i++) {
>> +        ret = bind_job_op_from_uop(&op, &bind->op.s[i]);
>> +        if (ret)
>> +            goto err_free;
>> +
>> +        list_add_tail(&op->entry, &job->ops);
>> +    }
>> +
>> +    job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
>> +    job->base.ops = &nouveau_bind_job_ops;
>> +
>> +    ret = nouveau_base_job_init(&job->base, &bind->base);
>> +    if (ret)
>> +        goto err_free;
>> +
>> +    return 0;
>> +
>> +err_free:
>> +    bind_job_ops_free(&job->ops);
>> +    kfree(job);
>> +    *pjob = NULL;
>> +
>> +    return ret;
>> +}
>> +
>> +static int
>> +sync_find_fence(struct nouveau_job *job,
>> +        struct drm_nouveau_sync *sync,
>> +        struct dma_fence **fence)
>> +{
>> +    u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
>> +    u64 point = 0;
>> +    int ret;
>> +
>> +    if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
>> +        stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>> +        return -EOPNOTSUPP;
>> +
>> +    if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>> +        point = sync->timeline_value;
>> +
>> +    ret = drm_syncobj_find_fence(job->file_priv,
>> +                     sync->handle, point,
>> +                     sync->flags, fence);
>> +    if (ret)
>> +        return ret;
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +exec_job_binds_wait(struct nouveau_job *job)
>> +{
>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>> +    struct nouveau_cli *cli = exec_job->base.cli;
>> +    struct nouveau_sched_entity *bind_entity = &cli->sched_entity;
>> +    signed long ret;
>> +    int i;
>> +
>> +    for (i = 0; i < job->in_sync.count; i++) {
>> +        struct nouveau_job *it;
>> +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
>> +        struct dma_fence *fence;
>> +        bool found;
>> +
>> +        ret = sync_find_fence(job, sync, &fence);
>> +        if (ret)
>> +            return ret;
>> +
>> +        mutex_lock(&bind_entity->job.mutex);
>> +        found = false;
>> +        list_for_each_entry(it, &bind_entity->job.list, head) {
>> +            if (fence == it->done_fence) {
>> +                found = true;
>> +                break;
>> +            }
>> +        }
>> +        mutex_unlock(&bind_entity->job.mutex);
>> +
>> +        /* If the fence is not from a VM_BIND job, don't wait for it. */
>> +        if (!found)
>> +            continue;
>> +
>> +        ret = dma_fence_wait_timeout(fence, true,
>> +                         msecs_to_jiffies(500));
>> +        if (ret < 0)
>> +            return ret;
>> +        else if (ret == 0)
>> +            return -ETIMEDOUT;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +int
>> +nouveau_exec_job_submit(struct nouveau_job *job)
>> +{
>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>> +    struct nouveau_cli *cli = exec_job->base.cli;
>> +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
>> +    struct drm_exec *exec = &job->exec;
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +    int ret;
>> +
>> +    ret = exec_job_binds_wait(job);
>> +    if (ret)
>> +        return ret;
>> +
>> +    nouveau_uvmm_lock(uvmm);
>> +    drm_exec_while_not_all_locked(exec) {
>> +        struct drm_gpuva *va;
>> +
>> +        drm_gpuva_for_each_va(va, &uvmm->umgr) {
>> +            ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
>> +            drm_exec_break_on_contention(exec);
>> +            if (ret)
>> +                return ret;
>> +        }
>> +    }
>> +    nouveau_uvmm_unlock(uvmm);
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        struct dma_resv *resv = obj->resv;
>> +        struct nouveau_bo *nvbo = nouveau_gem_object(obj);
>> +
>> +        ret = nouveau_bo_validate(nvbo, true, false);
>> +        if (ret)
>> +            return ret;
>> +
>> +        dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static struct dma_fence *
>> +nouveau_exec_job_run(struct nouveau_job *job)
>> +{
>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>> +    struct nouveau_fence *fence;
>> +    int i, ret;
>> +
>> +    ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
>> +    if (ret) {
>> +        NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
>> +        return ERR_PTR(ret);
>> +    }
>> +
>> +    for (i = 0; i < exec_job->push.count; i++) {
>> +        nv50_dma_push(job->chan, exec_job->push.s[i].va,
>> +                  exec_job->push.s[i].va_len);
>> +    }
>> +
>> +    ret = nouveau_fence_new(job->chan, false, &fence);
>> +    if (ret) {
>> +        NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
>> +        WIND_RING(job->chan);
>> +        return ERR_PTR(ret);
>> +    }
>> +
>> +    return &fence->base;
>> +}
>> +static void
>> +nouveau_exec_job_free(struct nouveau_job *job)
>> +{
>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>> +
>> +    nouveau_base_job_free(job);
>> +
>> +    kfree(exec_job->push.s);
>> +    kfree(exec_job);
>> +}
>> +
>> +static struct nouveau_job_ops nouveau_exec_job_ops = {
>> +    .submit = nouveau_exec_job_submit,
>> +    .run = nouveau_exec_job_run,
>> +    .free = nouveau_exec_job_free,
>> +};
>> +
>> +int
>> +nouveau_exec_job_init(struct nouveau_exec_job **pjob,
>> +              struct nouveau_exec *exec)
>> +{
>> +    struct nouveau_exec_job *job;
>> +    int ret;
>> +
>> +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
>> +    if (!job)
>> +        return -ENOMEM;
>> +
>> +    job->push.count = exec->push.count;
>> +    job->push.s = kmemdup(exec->push.s,
>> +                  sizeof(*exec->push.s) *
>> +                  exec->push.count,
>> +                  GFP_KERNEL);
>> +    if (!job->push.s) {
>> +        ret = -ENOMEM;
>> +        goto err_free_job;
>> +    }
>> +
>> +    job->base.ops = &nouveau_exec_job_ops;
>> +    ret = nouveau_base_job_init(&job->base, &exec->base);
>> +    if (ret)
>> +        goto err_free_pushs;
>> +
>> +    return 0;
>> +
>> +err_free_pushs:
>> +    kfree(job->push.s);
>> +err_free_job:
>> +    kfree(job);
>> +    *pjob = NULL;
>> +
>> +    return ret;
>> +}
>> +
>> +void nouveau_job_fini(struct nouveau_job *job)
>> +{
>> +    dma_fence_put(job->done_fence);
>> +    drm_sched_job_cleanup(&job->base);
>> +    job->ops->free(job);
>> +}
>> +
>> +static int
>> +nouveau_job_add_deps(struct nouveau_job *job)
>> +{
>> +    struct dma_fence *in_fence = NULL;
>> +    int ret, i;
>> +
>> +    for (i = 0; i < job->in_sync.count; i++) {
>> +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
>> +
>> +        ret = sync_find_fence(job, sync, &in_fence);
>> +        if (ret) {
>> +            NV_PRINTK(warn, job->cli,
>> +                  "Failed to find syncobj (-> in): handle=%d\n",
>> +                  sync->handle);
>> +            return ret;
>> +        }
>> +
>> +        ret = drm_sched_job_add_dependency(&job->base, in_fence);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence 
>> *fence)
>> +{
>> +    struct drm_syncobj *out_sync;
>> +    int i;
>> +
>> +    for (i = 0; i < job->out_sync.count; i++) {
>> +        struct drm_nouveau_sync *sync = &job->out_sync.s[i];
>> +        u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
>> +
>> +        if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
>> +            stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>> +            return -EOPNOTSUPP;
>> +
>> +        out_sync = drm_syncobj_find(job->file_priv, sync->handle);
>> +        if (!out_sync) {
>> +            NV_PRINTK(warn, job->cli,
>> +                  "Failed to find syncobj (-> out): handle=%d\n",
>> +                  sync->handle);
>> +            return -ENOENT;
>> +        }
>> +
>> +        if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
>> +            struct dma_fence_chain *chain;
>> +
>> +            chain = dma_fence_chain_alloc();
>> +            if (!chain) {
>> +                drm_syncobj_put(out_sync);
>> +                return -ENOMEM;
>> +            }
>> +
>> +            drm_syncobj_add_point(out_sync, chain, fence,
>> +                          sync->timeline_value);
>> +        } else {
>> +            drm_syncobj_replace_fence(out_sync, fence);
>> +        }
>> +
>> +        drm_syncobj_put(out_sync);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static struct dma_fence *
>> +nouveau_job_run(struct nouveau_job *job)
>> +{
>> +    return job->ops->run(job);
>> +}
>> +
>> +static int
>> +nouveau_job_run_sync(struct nouveau_job *job)
>> +{
>> +    struct dma_fence *fence;
>> +    int ret;
>> +
>> +    fence = nouveau_job_run(job);
>> +    if (IS_ERR(fence)) {
>> +        return PTR_ERR(fence);
>> +    } else if (fence) {
>> +        ret = dma_fence_wait(fence, true);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    dma_fence_signal(job->done_fence);
>> +
>> +    return 0;
>> +}
>> +
>> +int
>> +nouveau_job_submit(struct nouveau_job *job)
>> +{
>> +    struct nouveau_sched_entity *entity = 
>> to_nouveau_sched_entity(job->base.entity);
>> +    int ret;
>> +
>> +    drm_exec_init(&job->exec, true);
>> +
>> +    ret = nouveau_job_add_deps(job);
>> +    if (ret)
>> +        goto out;
>> +
>> +    drm_sched_job_arm(&job->base);
>> +    job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>> +
>> +    ret = nouveau_job_fence_attach(job, job->done_fence);
>> +    if (ret)
>> +        goto out;
>> +
>> +    if (job->ops->submit) {
>> +        ret = job->ops->submit(job);
>> +        if (ret)
>> +            goto out;
>> +    }
>> +
>> +    if (job->sync) {
>> +        drm_exec_fini(&job->exec);
>> +
>> +        /* We're requested to run a synchronous job, hence don't push
>> +         * the job, bypassing the job scheduler, and execute the jobs
>> +         * run() function right away.
>> +         *
>> +         * As a consequence of bypassing the job scheduler we need to
>> +         * handle fencing and job cleanup ourselfes.
>> +         */
>> +        ret = nouveau_job_run_sync(job);
>> +
>> +        /* If the job fails, the caller will do the cleanup for us. */
>> +        if (!ret)
>> +            nouveau_job_fini(job);
>> +
>> +        return ret;
>> +    } else {
>> +        mutex_lock(&entity->job.mutex);
>> +        drm_sched_entity_push_job(&job->base);
>> +        list_add_tail(&job->head, &entity->job.list);
>> +        mutex_unlock(&entity->job.mutex);
>> +    }
>> +
>> +out:
>> +    drm_exec_fini(&job->exec);
>> +    return ret;
>> +}
>> +
>> +static struct dma_fence *
>> +nouveau_sched_run_job(struct drm_sched_job *sched_job)
>> +{
>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>> +
>> +    return nouveau_job_run(job);
>> +}
>> +
>> +static enum drm_gpu_sched_stat
>> +nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
>> +{
>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>> +    struct nouveau_channel *chan = job->chan;
>> +
>> +    if (unlikely(!atomic_read(&chan->killed)))
>> +        nouveau_channel_kill(chan);
>> +
>> +    NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
>> +          chan->chid);
>> +
>> +    nouveau_sched_entity_fini(job->entity);
>> +
>> +    return DRM_GPU_SCHED_STAT_ENODEV;
>> +}
>> +
>> +static void
>> +nouveau_sched_free_job(struct drm_sched_job *sched_job)
>> +{
>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>> +    struct nouveau_sched_entity *entity = job->entity;
>> +
>> +    mutex_lock(&entity->job.mutex);
>> +    list_del(&job->head);
>> +    mutex_unlock(&entity->job.mutex);
>> +
>> +    nouveau_job_fini(job);
>> +}
>> +
>> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
>> +                  struct drm_gpu_scheduler *sched)
>> +{
>> +
>> +    INIT_LIST_HEAD(&entity->job.list);
>> +    mutex_init(&entity->job.mutex);
>> +
>> +    return drm_sched_entity_init(&entity->base,
>> +                     DRM_SCHED_PRIORITY_NORMAL,
>> +                     &sched, 1, NULL);
>> +}
>> +
>> +void
>> +nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
>> +{
>> +    drm_sched_entity_destroy(&entity->base);
>> +}
>> +
>> +static const struct drm_sched_backend_ops nouveau_sched_ops = {
>> +    .run_job = nouveau_sched_run_job,
>> +    .timedout_job = nouveau_sched_timedout_job,
>> +    .free_job = nouveau_sched_free_job,
>> +};
>> +
>> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
>> +               struct nouveau_drm *drm)
>> +{
>> +    long job_hang_limit = 
>> msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
>> +
>> +    return drm_sched_init(sched, &nouveau_sched_ops,
>> +                  NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
>> +                  NULL, NULL, "nouveau", drm->dev->dev);
>> +}
>> +
>> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
>> +{
>> +    drm_sched_fini(sched);
>> +}
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h 
>> b/drivers/gpu/drm/nouveau/nouveau_sched.h
>> new file mode 100644
>> index 000000000000..7fc5b7eea810
>> --- /dev/null
>> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
>> @@ -0,0 +1,98 @@
>> +// SPDX-License-Identifier: MIT
>> +
>> +#ifndef NOUVEAU_SCHED_H
>> +#define NOUVEAU_SCHED_H
>> +
>> +#include <linux/types.h>
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/gpu_scheduler.h>
>> +
>> +#include "nouveau_drv.h"
>> +#include "nouveau_exec.h"
>> +
>> +#define to_nouveau_job(sched_job)        \
>> +        container_of((sched_job), struct nouveau_job, base)
>> +
>> +#define to_nouveau_exec_job(job)        \
>> +        container_of((job), struct nouveau_exec_job, base)
>> +
>> +#define to_nouveau_bind_job(job)        \
>> +        container_of((job), struct nouveau_bind_job, base)
>> +
>> +struct nouveau_job {
>> +    struct drm_sched_job base;
>> +    struct list_head head;
>> +
>> +    struct nouveau_sched_entity *entity;
>> +
>> +    struct drm_file *file_priv;
>> +    struct nouveau_cli *cli;
>> +    struct nouveau_channel *chan;
>> +
>> +    struct drm_exec exec;
>> +    struct dma_fence *done_fence;
>> +
>> +    bool sync;
>> +
>> +    struct {
>> +        struct drm_nouveau_sync *s;
>> +        u32 count;
>> +    } in_sync;
>> +
>> +    struct {
>> +        struct drm_nouveau_sync *s;
>> +        u32 count;
>> +    } out_sync;
>> +
>> +    struct nouveau_job_ops {
>> +        int (*submit)(struct nouveau_job *);
>> +        struct dma_fence *(*run)(struct nouveau_job *);
>> +        void (*free)(struct nouveau_job *);
>> +    } *ops;
>> +};
>> +
>> +struct nouveau_exec_job {
>> +    struct nouveau_job base;
>> +
>> +    struct {
>> +        struct drm_nouveau_exec_push *s;
>> +        u32 count;
>> +    } push;
>> +};
>> +
>> +struct nouveau_bind_job {
>> +    struct nouveau_job base;
>> +
>> +    /* struct bind_job_op */
>> +    struct list_head ops;
>> +};
>> +
>> +int nouveau_bind_job_init(struct nouveau_bind_job **job,
>> +              struct nouveau_exec_bind *bind);
>> +int nouveau_exec_job_init(struct nouveau_exec_job **job,
>> +              struct nouveau_exec *exec);
>> +
>> +int nouveau_job_submit(struct nouveau_job *job);
>> +void nouveau_job_fini(struct nouveau_job *job);
>> +
>> +#define to_nouveau_sched_entity(entity)        \
>> +        container_of((entity), struct nouveau_sched_entity, base)
>> +
>> +struct nouveau_sched_entity {
>> +    struct drm_sched_entity base;
>> +    struct {
>> +        struct list_head list;
>> +        struct mutex mutex;
>> +    } job;
>> +};
>> +
>> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
>> +                  struct drm_gpu_scheduler *sched);
>> +void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
>> +
>> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
>> +               struct nouveau_drm *drm);
>> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
>> +
>> +#endif
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-18 19:48                 ` Christian König
@ 2023-01-19  4:04                   ` Danilo Krummrich
  2023-01-19  5:23                     ` Matthew Brost
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-19  4:04 UTC (permalink / raw)
  To: Christian König, Dave Airlie, Alex Deucher
  Cc: tzimmermann, corbet, nouveau, dri-devel, linux-doc, linux-kernel,
	bskeggs, jason, airlied

On 1/18/23 20:48, Christian König wrote:
> Am 18.01.23 um 20:17 schrieb Dave Airlie:
>> On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
>>> On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich <dakr@redhat.com> 
>>> wrote:
>>>>
>>>>
>>>> On 1/18/23 17:30, Alex Deucher wrote:
>>>>> On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich <dakr@redhat.com> 
>>>>> wrote:
>>>>>> On 1/18/23 16:37, Christian König wrote:
>>>>>>> Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> On 1/18/23 09:53, Christian König wrote:
>>>>>>>>> Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
>>>>>>>>>> This patch series provides a new UAPI for the Nouveau driver in
>>>>>>>>>> order to
>>>>>>>>>> support Vulkan features, such as sparse bindings and sparse 
>>>>>>>>>> residency.
>>>>>>>>>>
>>>>>>>>>> Furthermore, with the DRM GPUVA manager it provides a new DRM 
>>>>>>>>>> core
>>>>>>>>>> feature to
>>>>>>>>>> keep track of GPU virtual address (VA) mappings in a more 
>>>>>>>>>> generic way.
>>>>>>>>>>
>>>>>>>>>> The DRM GPUVA manager is indented to help drivers implement
>>>>>>>>>> userspace-manageable
>>>>>>>>>> GPU VA spaces in reference to the Vulkan API. In order to achieve
>>>>>>>>>> this goal it
>>>>>>>>>> serves the following purposes in this context.
>>>>>>>>>>
>>>>>>>>>>        1) Provide a dedicated range allocator to track GPU VA
>>>>>>>>>> allocations and
>>>>>>>>>>           mappings, making use of the drm_mm range allocator.
>>>>>>>>> This means that the ranges are allocated by the kernel? If yes 
>>>>>>>>> that's
>>>>>>>>> a really really bad idea.
>>>>>>>> No, it's just for keeping track of the ranges userspace has 
>>>>>>>> allocated.
>>>>>>> Ok, that makes more sense.
>>>>>>>
>>>>>>> So basically you have an IOCTL which asks kernel for a free 
>>>>>>> range? Or
>>>>>>> what exactly is the drm_mm used for here?
>>>>>> Not even that, userspace provides both the base address and the 
>>>>>> range,
>>>>>> the kernel really just keeps track of things. Though, writing a 
>>>>>> UAPI on
>>>>>> top of the GPUVA manager asking for a free range instead would be
>>>>>> possible by just adding the corresponding wrapper functions to get a
>>>>>> free hole.
>>>>>>
>>>>>> Currently, and that's what I think I read out of your question, 
>>>>>> the main
>>>>>> benefit of using drm_mm over simply stuffing the entries into a 
>>>>>> list or
>>>>>> something boils down to easier collision detection and iterating
>>>>>> sub-ranges of the whole VA space.
>>>>> Why not just do this in userspace?  We have a range manager in
>>>>> libdrm_amdgpu that you could lift out into libdrm or some other
>>>>> helper.
>>>> The kernel still needs to keep track of the mappings within the various
>>>> VA spaces, e.g. it silently needs to unmap mappings that are backed by
>>>> BOs that get evicted and remap them once they're validated (or swapped
>>>> back in).
>>> Ok, you are just using this for maintaining the GPU VM space in the 
>>> kernel.
>>>
>> Yes the idea behind having common code wrapping drm_mm for this is to
>> allow us to make the rules consistent across drivers.
>>
>> Userspace (generally Vulkan, some compute) has interfaces that pretty
>> much dictate a lot of how VMA tracking works, esp around lifetimes,
>> sparse mappings and splitting/merging underlying page tables, I'd
>> really like this to be more consistent across drivers, because already
>> I think we've seen with freedreno some divergence from amdgpu and we
>> also have i915/xe to deal with. I'd like to at least have one place
>> that we can say this is how it should work, since this is something
>> that *should* be consistent across drivers mostly, as it is more about
>> how the uapi is exposed.
> 
> That's a really good idea, but the implementation with drm_mm won't work 
> like that.
> 
> We have Vulkan applications which use the sparse feature to create 
> literally millions of mappings. That's why I have fine tuned the mapping 
> structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle 
> possible in the handling of that.

That's a valuable information. Can you recommend such an application for 
testing / benchmarking?

Your optimization effort sounds great. May it be worth thinking about 
generalizing your approach by itself and stacking the drm_gpuva_manager 
on top of it?

> 
> A drm_mm_node is more in the range of ~200 bytes and certainly not 
> suitable for this kind of job.
> 
> I strongly suggest to rather use a good bunch of the amdgpu VM code as 
> blueprint for the common infrastructure.

I will definitely have look.

> 
> Regards,
> Christian.
> 
>>
>> Dave.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
@ 2023-01-19  4:14   ` Bagas Sanjaya
  2023-01-20 18:32     ` Danilo Krummrich
  2023-01-23 23:23   ` Niranjana Vishwanathapura
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 75+ messages in thread
From: Bagas Sanjaya @ 2023-01-19  4:14 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, christian.koenig, bskeggs,
	jason, tzimmermann, mripard, corbet
  Cc: dri-devel, nouveau, linux-doc, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 16188 bytes --]

On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
> This adds the infrastructure for a manager implementation to keep track
> of GPU virtual address (VA) mappings.

"Add infrastructure for ..."

> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
> + * provides drivers a the list of operations to be executed in order to unmap
> + * a range of GPU VA space. The logic behind this functions is way simpler
> + * though: For all existent mappings enclosed by the given range unmap
> + * operations are created. For mappings which are only partically located within
> + * the given range, remap operations are created such that those mappings are
> + * split up and re-mapped partically.

"Analogous to ..."

> + *
> + * The following paragraph depicts the basic constellations of existent GPU VA
> + * mappings, a newly requested mapping and the resulting mappings as implemented
> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
> + * of those constellations.
> + *
> + * ::
> + *
> + *	1) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	2) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=m)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=m)
> + *
> + *
> + *	3) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	4) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0  a  1
> + *	old: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or bo_offset.
> + *
> + *
> + *	5) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  b  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0  b  1  a' 2
> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	6) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  a  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	7) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  b  2
> + *	req:       |-----| (bo_offset=m)
> + *
> + *	     0  a  1  b  2
> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *	8) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	      0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  a  2
> + *	req:       |-----| (bo_offset=n+1)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	9) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------|       (bo_offset=n)
> + *
> + *	           1     b     3
> + *	req:       |-----------| (bo_offset=m)
> + *
> + *	     0  a  1     b     3
> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *	10) Existent mapping is merged.
> + *	-------------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------|       (bo_offset=n)
> + *
> + *	           1     a     3
> + *	req:       |-----------| (bo_offset=n+1)
> + *
> + *	     0        a        3
> + *	new: |-----------------| (bo_offset=n)
> + *
> + *
> + *	11) Existent mapping is split.
> + *	------------------------------
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *	           1  b  2
> + *	req:       |-----|       (bo_offset=m)
> + *
> + *	     0  a  1  b  2  a' 3
> + *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
> + *
> + *
> + *	12) Existent mapping is kept.
> + *	-----------------------------
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *	           1  a  2
> + *	req:       |-----|       (bo_offset=n+1)
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *
> + *	13) Existent mapping is replaced.
> + *	---------------------------------
> + *
> + *	           1  a  2
> + *	old:       |-----| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	14) Existent mapping is replaced.
> + *	---------------------------------
> + *
> + *	           1  a  2
> + *	old:       |-----| (bo_offset=n)
> + *
> + *	     0        a       3
> + *	req: |----------------| (bo_offset=n)
> + *
> + *	     0        a       3
> + *	new: |----------------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	15) Existent mapping is split.
> + *	------------------------------
> + *
> + *	           1     a     3
> + *	old:       |-----------| (bo_offset=n)
> + *
> + *	     0     b     2
> + *	req: |-----------|       (bo_offset=m)
> + *
> + *	     0     b     2  a' 3
> + *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> + *
> + *
> + *	16) Existent mappings are merged.
> + *	---------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------|                        (bo_offset=n)
> + *
> + *	                            2     a     3
> + *	old':                       |-----------| (bo_offset=n+2)
> + *
> + *	                1     a     2
> + *	req:            |-----------|             (bo_offset=n+1)
> + *
> + *	                      a
> + *	new: |----------------------------------| (bo_offset=n)
> + */

Factor out lists from the big code block above:

---- >8 ----

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index e665f642689d03..411c0aa80bfa1f 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -129,15 +129,14 @@
  * the given range, remap operations are created such that those mappings are
  * split up and re-mapped partically.
  *
- * The following paragraph depicts the basic constellations of existent GPU VA
+ * The following diagram depicts the basic relationships of existent GPU VA
  * mappings, a newly requested mapping and the resulting mappings as implemented
- * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
- * of those constellations.
+ * by drm_gpuva_sm_map_ops_create()  - it doesn't cover any arbitrary
+ * combinations of these.
  *
- * ::
- *
- *	1) Existent mapping is kept.
- *	----------------------------
+ * 1) Existent mapping is kept.
+ * 
+ *    ::
  *
  *	     0     a     1
  *	old: |-----------| (bo_offset=n)
@@ -149,8 +148,9 @@
  *	new: |-----------| (bo_offset=n)
  *
  *
- *	2) Existent mapping is replaced.
- *	--------------------------------
+ * 2) Existent mapping is replaced.
+ *
+ *    ::
  *
  *	     0     a     1
  *	old: |-----------| (bo_offset=n)
@@ -162,8 +162,9 @@
  *	new: |-----------| (bo_offset=m)
  *
  *
- *	3) Existent mapping is replaced.
- *	--------------------------------
+ * 3) Existent mapping is replaced.
+ *
+ *    ::
  *
  *	     0     a     1
  *	old: |-----------| (bo_offset=n)
@@ -175,8 +176,9 @@
  *	new: |-----------| (bo_offset=n)
  *
  *
- *	4) Existent mapping is replaced.
- *	--------------------------------
+ * 4) Existent mapping is replaced.
+ *
+ *    ::
  *
  *	     0  a  1
  *	old: |-----|       (bo_offset=n)
@@ -187,12 +189,14 @@
  *	     0     a     2
  *	new: |-----------| (bo_offset=n)
  *
- *	Note: We expect to see the same result for a request with a different bo
- *	      and/or bo_offset.
+ *    .. note::
+ *       We expect to see the same result for a request with a different bo
+ *       and/or bo_offset.
  *
  *
- *	5) Existent mapping is split.
- *	-----------------------------
+ * 5) Existent mapping is split.
+ *
+ *    ::
  *
  *	     0     a     2
  *	old: |-----------| (bo_offset=n)
@@ -203,12 +207,14 @@
  *	     0  b  1  a' 2
  *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
  *
- *	Note: We expect to see the same result for a request with a different bo
- *	      and/or non-contiguous bo_offset.
+ *    .. note::
+ *       We expect to see the same result for a request with a different bo
+ *       and/or non-contiguous bo_offset.
  *
  *
- *	6) Existent mapping is kept.
- *	----------------------------
+ * 6) Existent mapping is kept.
+ *
+ *    ::
  *
  *	     0     a     2
  *	old: |-----------| (bo_offset=n)
@@ -220,8 +226,9 @@
  *	new: |-----------| (bo_offset=n)
  *
  *
- *	7) Existent mapping is split.
- *	-----------------------------
+ * 7) Existent mapping is split.
+ *
+ *    ::
  *
  *	     0     a     2
  *	old: |-----------| (bo_offset=n)
@@ -233,8 +240,9 @@
  *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
  *
  *
- *	8) Existent mapping is kept.
- *	----------------------------
+ * 8) Existent mapping is kept.
+ *
+ *    ::
  *
  *	      0     a     2
  *	old: |-----------| (bo_offset=n)
@@ -246,8 +254,9 @@
  *	new: |-----------| (bo_offset=n)
  *
  *
- *	9) Existent mapping is split.
- *	-----------------------------
+ * 9) Existent mapping is split.
+ *
+ *    ::
  *
  *	     0     a     2
  *	old: |-----------|       (bo_offset=n)
@@ -259,104 +268,113 @@
  *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
  *
  *
- *	10) Existent mapping is merged.
- *	-------------------------------
+ * 10) Existent mapping is merged.
  *
- *	     0     a     2
- *	old: |-----------|       (bo_offset=n)
+ *     ::
  *
- *	           1     a     3
- *	req:       |-----------| (bo_offset=n+1)
+ *	      0     a     2
+ *	 old: |-----------|       (bo_offset=n)
  *
- *	     0        a        3
- *	new: |-----------------| (bo_offset=n)
+ *	            1     a     3
+ *	 req:       |-----------| (bo_offset=n+1)
+ *
+ *	      0        a        3
+ *	 new: |-----------------| (bo_offset=n)
  *
  *
- *	11) Existent mapping is split.
- *	------------------------------
+ * 11) Existent mapping is split.
  *
- *	     0        a        3
- *	old: |-----------------| (bo_offset=n)
+ *     ::
  *
- *	           1  b  2
- *	req:       |-----|       (bo_offset=m)
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
  *
- *	     0  a  1  b  2  a' 3
- *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
+ *	            1  b  2
+ *	 req:       |-----|       (bo_offset=m)
+ *
+ *	      0  a  1  b  2  a' 3
+ *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
  *
  *
- *	12) Existent mapping is kept.
- *	-----------------------------
+ * 12) Existent mapping is kept.
  *
- *	     0        a        3
- *	old: |-----------------| (bo_offset=n)
+ *     ::
  *
- *	           1  a  2
- *	req:       |-----|       (bo_offset=n+1)
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
  *
- *	     0        a        3
- *	old: |-----------------| (bo_offset=n)
+ *	            1  a  2
+ *	 req:       |-----|       (bo_offset=n+1)
+ *
+ *	      0        a        3
+ *	 old: |-----------------| (bo_offset=n)
  *
  *
- *	13) Existent mapping is replaced.
- *	---------------------------------
+ * 13) Existent mapping is replaced.
  *
- *	           1  a  2
- *	old:       |-----| (bo_offset=n)
+ *     ::
  *
- *	     0     a     2
- *	req: |-----------| (bo_offset=n)
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n)
  *
- *	     0     a     2
- *	new: |-----------| (bo_offset=n)
+ *	      0     a     2
+ *	 req: |-----------| (bo_offset=n)
  *
- *	Note: We expect to see the same result for a request with a different bo
- *	      and/or non-contiguous bo_offset.
+ *	      0     a     2
+ *	 new: |-----------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
  *
  *
- *	14) Existent mapping is replaced.
- *	---------------------------------
+ * 14) Existent mapping is replaced.
  *
- *	           1  a  2
- *	old:       |-----| (bo_offset=n)
+ *     ::
  *
- *	     0        a       3
- *	req: |----------------| (bo_offset=n)
+ *	            1  a  2
+ *	 old:       |-----| (bo_offset=n)
  *
- *	     0        a       3
- *	new: |----------------| (bo_offset=n)
+ *	      0        a       3
+ *	 req: |----------------| (bo_offset=n)
  *
- *	Note: We expect to see the same result for a request with a different bo
- *	      and/or non-contiguous bo_offset.
+ *	      0        a       3
+ *	 new: |----------------| (bo_offset=n)
+ *
+ *     .. note::
+ *        We expect to see the same result for a request with a different bo
+ *        and/or non-contiguous bo_offset.
  *
  *
- *	15) Existent mapping is split.
- *	------------------------------
+ * 15) Existent mapping is split.
  *
- *	           1     a     3
- *	old:       |-----------| (bo_offset=n)
+ *     ::
  *
- *	     0     b     2
- *	req: |-----------|       (bo_offset=m)
+ *	            1     a     3
+ *	 old:       |-----------| (bo_offset=n)
  *
- *	     0     b     2  a' 3
- *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
+ *	      0     b     2
+ *	 req: |-----------|       (bo_offset=m)
+ *
+ *	      0     b     2  a' 3
+ *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
  *
  *
- *	16) Existent mappings are merged.
- *	---------------------------------
+ * 16) Existent mappings are merged.
  *
- *	     0     a     1
- *	old: |-----------|                        (bo_offset=n)
+ *     ::
  *
- *	                            2     a     3
- *	old':                       |-----------| (bo_offset=n+2)
+ *	      0     a     1
+ *	 old: |-----------|                        (bo_offset=n)
  *
- *	                1     a     2
- *	req:            |-----------|             (bo_offset=n+1)
+ *	                             2     a     3
+ *	 old':                       |-----------| (bo_offset=n+2)
  *
- *	                      a
- *	new: |----------------------------------| (bo_offset=n)
+ *	                 1     a     2
+ *	 req:            |-----------|             (bo_offset=n+1)
+ *
+ *	                       a
+ *	 new: |----------------------------------| (bo_offset=n)
  */
 
 /**

However, the relationship scenario descriptions are too generic (different
diagrams are described by the same text). Please rewrite them, taking into
account bo_offset values in each scenario.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-19  3:44     ` Danilo Krummrich
@ 2023-01-19  4:58       ` Matthew Brost
  2023-01-19  7:32         ` Thomas Hellström (Intel)
  2023-01-20 10:08         ` Boris Brezillon
  0 siblings, 2 replies; 75+ messages in thread
From: Matthew Brost @ 2023-01-19  4:58 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Thomas Hellström (Intel),
	daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > 
> > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > This commit provides the implementation for the new uapi motivated by the
> > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > 
> > > 1) Initialize a GPU virtual address (VA) space via the new
> > >     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
> > >     space managed by the kernel and userspace, respectively.
> > > 
> > > 2) Allocate and free a VA space region as well as bind and unbind memory
> > >     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > >     UMDs can request the named operations to be processed either
> > >     synchronously or asynchronously. It supports DRM syncobjs
> > >     (incl. timelines) as synchronization mechanism. The management of the
> > >     GPU VA mappings is implemented with the DRM GPU VA manager.
> > > 
> > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
> > >     execution happens asynchronously. It supports DRM syncobj (incl.
> > >     timelines) as synchronization mechanism. DRM GEM object locking is
> > >     handled with drm_exec.
> > > 
> > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
> > > GPU scheduler for the asynchronous paths.
> > > 
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >   Documentation/gpu/driver-uapi.rst       |   3 +
> > >   drivers/gpu/drm/nouveau/Kbuild          |   2 +
> > >   drivers/gpu/drm/nouveau/Kconfig         |   2 +
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > >   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > >   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > >   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++++++++++
> > >   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > >   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++
> > >   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > >   11 files changed, 1295 insertions(+), 4 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> > ...
> > > 
> > > +static struct dma_fence *
> > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > +    struct bind_job_op *op;
> > > +    int ret = 0;
> > > +
> > 
> > I was looking at how nouveau does the async binding compared to how xe
> > does it.
> > It looks to me that this function being a scheduler run_job callback is
> > the main part of the VM_BIND dma-fence signalling critical section for
> > the job's done_fence and if so, needs to be annotated as such?
> 
> Yes, that's the case.
> 
> > 
> > For example nouveau_uvma_region_new allocates memory, which is not
> > allowed if in a dma_fence signalling critical section and the locking
> > also looks suspicious?
> 
> Thanks for pointing this out, I missed that somehow.
> 
> I will change it to pre-allocate new regions, mappings and page tables
> within the job's submit() function.
>

Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
 
> For the ops structures the drm_gpuva_manager allocates for reporting the
> split/merge steps back to the driver I have ideas to entirely avoid
> allocations, which also is a good thing in respect of Christians feedback
> regarding the huge amount of mapping requests some applications seem to
> generate.
>

It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.

Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream sooner.
 
> Regarding the locking, anything specific that makes it look suspicious to
> you?
> 

I haven't looked into this too but almost certainly Thomas is suggesting
that if you allocate memory anywhere under the nouveau_uvmm_lock then
you can't use this lock in the run_job() callback as this in the
dma-fencing path.

Matt 

> > 
> > Thanks,
> > 
> > Thomas
> > 
> > 
> > > +    nouveau_uvmm_lock(uvmm);
> > > +    list_for_each_op(op, &bind_job->ops) {
> > > +        switch (op->op) {
> > > +        case OP_ALLOC: {
> > > +            bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
> > > +
> > > +            ret = nouveau_uvma_region_new(uvmm,
> > > +                              op->va.addr,
> > > +                              op->va.range,
> > > +                              sparse);
> > > +            if (ret)
> > > +                goto out_unlock;
> > > +            break;
> > > +        }
> > > +        case OP_FREE:
> > > +            ret = nouveau_uvma_region_destroy(uvmm,
> > > +                              op->va.addr,
> > > +                              op->va.range);
> > > +            if (ret)
> > > +                goto out_unlock;
> > > +            break;
> > > +        case OP_MAP:
> > > +            ret = nouveau_uvmm_sm_map(uvmm,
> > > +                          op->va.addr, op->va.range,
> > > +                          op->gem.obj, op->gem.offset,
> > > +                          op->flags && 0xff);
> > > +            if (ret)
> > > +                goto out_unlock;
> > > +            break;
> > > +        case OP_UNMAP:
> > > +            ret = nouveau_uvmm_sm_unmap(uvmm,
> > > +                            op->va.addr,
> > > +                            op->va.range);
> > > +            if (ret)
> > > +                goto out_unlock;
> > > +            break;
> > > +        }
> > > +    }
> > > +
> > > +out_unlock:
> > > +    nouveau_uvmm_unlock(uvmm);
> > > +    if (ret)
> > > +        NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
> > > +    return ERR_PTR(ret);
> > > +}
> > > +
> > > +static void
> > > +nouveau_bind_job_free(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > +    struct bind_job_op *op, *next;
> > > +
> > > +    list_for_each_op_safe(op, next, &bind_job->ops) {
> > > +        struct drm_gem_object *obj = op->gem.obj;
> > > +
> > > +        if (obj)
> > > +            drm_gem_object_put(obj);
> > > +
> > > +        list_del(&op->entry);
> > > +        kfree(op);
> > > +    }
> > > +
> > > +    nouveau_base_job_free(job);
> > > +    kfree(bind_job);
> > > +}
> > > +
> > > +static struct nouveau_job_ops nouveau_bind_job_ops = {
> > > +    .submit = nouveau_bind_job_submit,
> > > +    .run = nouveau_bind_job_run,
> > > +    .free = nouveau_bind_job_free,
> > > +};
> > > +
> > > +static int
> > > +bind_job_op_from_uop(struct bind_job_op **pop,
> > > +             struct drm_nouveau_vm_bind_op *uop)
> > > +{
> > > +    struct bind_job_op *op;
> > > +
> > > +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
> > > +    if (!op)
> > > +        return -ENOMEM;
> > > +
> > > +    op->op = uop->op;
> > > +    op->flags = uop->flags;
> > > +    op->va.addr = uop->addr;
> > > +    op->va.range = uop->range;
> > > +
> > > +    if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) {
> > > +        op->gem.handle = uop->handle;
> > > +        op->gem.offset = uop->bo_offset;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static void
> > > +bind_job_ops_free(struct list_head *ops)
> > > +{
> > > +    struct bind_job_op *op, *next;
> > > +
> > > +    list_for_each_op_safe(op, next, ops) {
> > > +        list_del(&op->entry);
> > > +        kfree(op);
> > > +    }
> > > +}
> > > +
> > > +int
> > > +nouveau_bind_job_init(struct nouveau_bind_job **pjob,
> > > +              struct nouveau_exec_bind *bind)
> > > +{
> > > +    struct nouveau_bind_job *job;
> > > +    struct bind_job_op *op;
> > > +    int i, ret;
> > > +
> > > +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
> > > +    if (!job)
> > > +        return -ENOMEM;
> > > +
> > > +    INIT_LIST_HEAD(&job->ops);
> > > +
> > > +    for (i = 0; i < bind->op.count; i++) {
> > > +        ret = bind_job_op_from_uop(&op, &bind->op.s[i]);
> > > +        if (ret)
> > > +            goto err_free;
> > > +
> > > +        list_add_tail(&op->entry, &job->ops);
> > > +    }
> > > +
> > > +    job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
> > > +    job->base.ops = &nouveau_bind_job_ops;
> > > +
> > > +    ret = nouveau_base_job_init(&job->base, &bind->base);
> > > +    if (ret)
> > > +        goto err_free;
> > > +
> > > +    return 0;
> > > +
> > > +err_free:
> > > +    bind_job_ops_free(&job->ops);
> > > +    kfree(job);
> > > +    *pjob = NULL;
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +static int
> > > +sync_find_fence(struct nouveau_job *job,
> > > +        struct drm_nouveau_sync *sync,
> > > +        struct dma_fence **fence)
> > > +{
> > > +    u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
> > > +    u64 point = 0;
> > > +    int ret;
> > > +
> > > +    if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
> > > +        stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> > > +        return -EOPNOTSUPP;
> > > +
> > > +    if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> > > +        point = sync->timeline_value;
> > > +
> > > +    ret = drm_syncobj_find_fence(job->file_priv,
> > > +                     sync->handle, point,
> > > +                     sync->flags, fence);
> > > +    if (ret)
> > > +        return ret;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int
> > > +exec_job_binds_wait(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> > > +    struct nouveau_cli *cli = exec_job->base.cli;
> > > +    struct nouveau_sched_entity *bind_entity = &cli->sched_entity;
> > > +    signed long ret;
> > > +    int i;
> > > +
> > > +    for (i = 0; i < job->in_sync.count; i++) {
> > > +        struct nouveau_job *it;
> > > +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
> > > +        struct dma_fence *fence;
> > > +        bool found;
> > > +
> > > +        ret = sync_find_fence(job, sync, &fence);
> > > +        if (ret)
> > > +            return ret;
> > > +
> > > +        mutex_lock(&bind_entity->job.mutex);
> > > +        found = false;
> > > +        list_for_each_entry(it, &bind_entity->job.list, head) {
> > > +            if (fence == it->done_fence) {
> > > +                found = true;
> > > +                break;
> > > +            }
> > > +        }
> > > +        mutex_unlock(&bind_entity->job.mutex);
> > > +
> > > +        /* If the fence is not from a VM_BIND job, don't wait for it. */
> > > +        if (!found)
> > > +            continue;
> > > +
> > > +        ret = dma_fence_wait_timeout(fence, true,
> > > +                         msecs_to_jiffies(500));
> > > +        if (ret < 0)
> > > +            return ret;
> > > +        else if (ret == 0)
> > > +            return -ETIMEDOUT;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +int
> > > +nouveau_exec_job_submit(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> > > +    struct nouveau_cli *cli = exec_job->base.cli;
> > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
> > > +    struct drm_exec *exec = &job->exec;
> > > +    struct drm_gem_object *obj;
> > > +    unsigned long index;
> > > +    int ret;
> > > +
> > > +    ret = exec_job_binds_wait(job);
> > > +    if (ret)
> > > +        return ret;
> > > +
> > > +    nouveau_uvmm_lock(uvmm);
> > > +    drm_exec_while_not_all_locked(exec) {
> > > +        struct drm_gpuva *va;
> > > +
> > > +        drm_gpuva_for_each_va(va, &uvmm->umgr) {
> > > +            ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
> > > +            drm_exec_break_on_contention(exec);
> > > +            if (ret)
> > > +                return ret;
> > > +        }
> > > +    }
> > > +    nouveau_uvmm_unlock(uvmm);
> > > +
> > > +    drm_exec_for_each_locked_object(exec, index, obj) {
> > > +        struct dma_resv *resv = obj->resv;
> > > +        struct nouveau_bo *nvbo = nouveau_gem_object(obj);
> > > +
> > > +        ret = nouveau_bo_validate(nvbo, true, false);
> > > +        if (ret)
> > > +            return ret;
> > > +
> > > +        dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE);
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static struct dma_fence *
> > > +nouveau_exec_job_run(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> > > +    struct nouveau_fence *fence;
> > > +    int i, ret;
> > > +
> > > +    ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
> > > +    if (ret) {
> > > +        NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
> > > +        return ERR_PTR(ret);
> > > +    }
> > > +
> > > +    for (i = 0; i < exec_job->push.count; i++) {
> > > +        nv50_dma_push(job->chan, exec_job->push.s[i].va,
> > > +                  exec_job->push.s[i].va_len);
> > > +    }
> > > +
> > > +    ret = nouveau_fence_new(job->chan, false, &fence);
> > > +    if (ret) {
> > > +        NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
> > > +        WIND_RING(job->chan);
> > > +        return ERR_PTR(ret);
> > > +    }
> > > +
> > > +    return &fence->base;
> > > +}
> > > +static void
> > > +nouveau_exec_job_free(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
> > > +
> > > +    nouveau_base_job_free(job);
> > > +
> > > +    kfree(exec_job->push.s);
> > > +    kfree(exec_job);
> > > +}
> > > +
> > > +static struct nouveau_job_ops nouveau_exec_job_ops = {
> > > +    .submit = nouveau_exec_job_submit,
> > > +    .run = nouveau_exec_job_run,
> > > +    .free = nouveau_exec_job_free,
> > > +};
> > > +
> > > +int
> > > +nouveau_exec_job_init(struct nouveau_exec_job **pjob,
> > > +              struct nouveau_exec *exec)
> > > +{
> > > +    struct nouveau_exec_job *job;
> > > +    int ret;
> > > +
> > > +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
> > > +    if (!job)
> > > +        return -ENOMEM;
> > > +
> > > +    job->push.count = exec->push.count;
> > > +    job->push.s = kmemdup(exec->push.s,
> > > +                  sizeof(*exec->push.s) *
> > > +                  exec->push.count,
> > > +                  GFP_KERNEL);
> > > +    if (!job->push.s) {
> > > +        ret = -ENOMEM;
> > > +        goto err_free_job;
> > > +    }
> > > +
> > > +    job->base.ops = &nouveau_exec_job_ops;
> > > +    ret = nouveau_base_job_init(&job->base, &exec->base);
> > > +    if (ret)
> > > +        goto err_free_pushs;
> > > +
> > > +    return 0;
> > > +
> > > +err_free_pushs:
> > > +    kfree(job->push.s);
> > > +err_free_job:
> > > +    kfree(job);
> > > +    *pjob = NULL;
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +void nouveau_job_fini(struct nouveau_job *job)
> > > +{
> > > +    dma_fence_put(job->done_fence);
> > > +    drm_sched_job_cleanup(&job->base);
> > > +    job->ops->free(job);
> > > +}
> > > +
> > > +static int
> > > +nouveau_job_add_deps(struct nouveau_job *job)
> > > +{
> > > +    struct dma_fence *in_fence = NULL;
> > > +    int ret, i;
> > > +
> > > +    for (i = 0; i < job->in_sync.count; i++) {
> > > +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
> > > +
> > > +        ret = sync_find_fence(job, sync, &in_fence);
> > > +        if (ret) {
> > > +            NV_PRINTK(warn, job->cli,
> > > +                  "Failed to find syncobj (-> in): handle=%d\n",
> > > +                  sync->handle);
> > > +            return ret;
> > > +        }
> > > +
> > > +        ret = drm_sched_job_add_dependency(&job->base, in_fence);
> > > +        if (ret)
> > > +            return ret;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int
> > > +nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence
> > > *fence)
> > > +{
> > > +    struct drm_syncobj *out_sync;
> > > +    int i;
> > > +
> > > +    for (i = 0; i < job->out_sync.count; i++) {
> > > +        struct drm_nouveau_sync *sync = &job->out_sync.s[i];
> > > +        u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
> > > +
> > > +        if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
> > > +            stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
> > > +            return -EOPNOTSUPP;
> > > +
> > > +        out_sync = drm_syncobj_find(job->file_priv, sync->handle);
> > > +        if (!out_sync) {
> > > +            NV_PRINTK(warn, job->cli,
> > > +                  "Failed to find syncobj (-> out): handle=%d\n",
> > > +                  sync->handle);
> > > +            return -ENOENT;
> > > +        }
> > > +
> > > +        if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
> > > +            struct dma_fence_chain *chain;
> > > +
> > > +            chain = dma_fence_chain_alloc();
> > > +            if (!chain) {
> > > +                drm_syncobj_put(out_sync);
> > > +                return -ENOMEM;
> > > +            }
> > > +
> > > +            drm_syncobj_add_point(out_sync, chain, fence,
> > > +                          sync->timeline_value);
> > > +        } else {
> > > +            drm_syncobj_replace_fence(out_sync, fence);
> > > +        }
> > > +
> > > +        drm_syncobj_put(out_sync);
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static struct dma_fence *
> > > +nouveau_job_run(struct nouveau_job *job)
> > > +{
> > > +    return job->ops->run(job);
> > > +}
> > > +
> > > +static int
> > > +nouveau_job_run_sync(struct nouveau_job *job)
> > > +{
> > > +    struct dma_fence *fence;
> > > +    int ret;
> > > +
> > > +    fence = nouveau_job_run(job);
> > > +    if (IS_ERR(fence)) {
> > > +        return PTR_ERR(fence);
> > > +    } else if (fence) {
> > > +        ret = dma_fence_wait(fence, true);
> > > +        if (ret)
> > > +            return ret;
> > > +    }
> > > +
> > > +    dma_fence_signal(job->done_fence);
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +int
> > > +nouveau_job_submit(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_sched_entity *entity =
> > > to_nouveau_sched_entity(job->base.entity);
> > > +    int ret;
> > > +
> > > +    drm_exec_init(&job->exec, true);
> > > +
> > > +    ret = nouveau_job_add_deps(job);
> > > +    if (ret)
> > > +        goto out;
> > > +
> > > +    drm_sched_job_arm(&job->base);
> > > +    job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > +
> > > +    ret = nouveau_job_fence_attach(job, job->done_fence);
> > > +    if (ret)
> > > +        goto out;
> > > +
> > > +    if (job->ops->submit) {
> > > +        ret = job->ops->submit(job);
> > > +        if (ret)
> > > +            goto out;
> > > +    }
> > > +
> > > +    if (job->sync) {
> > > +        drm_exec_fini(&job->exec);
> > > +
> > > +        /* We're requested to run a synchronous job, hence don't push
> > > +         * the job, bypassing the job scheduler, and execute the jobs
> > > +         * run() function right away.
> > > +         *
> > > +         * As a consequence of bypassing the job scheduler we need to
> > > +         * handle fencing and job cleanup ourselfes.
> > > +         */
> > > +        ret = nouveau_job_run_sync(job);
> > > +
> > > +        /* If the job fails, the caller will do the cleanup for us. */
> > > +        if (!ret)
> > > +            nouveau_job_fini(job);
> > > +
> > > +        return ret;
> > > +    } else {
> > > +        mutex_lock(&entity->job.mutex);
> > > +        drm_sched_entity_push_job(&job->base);
> > > +        list_add_tail(&job->head, &entity->job.list);
> > > +        mutex_unlock(&entity->job.mutex);
> > > +    }
> > > +
> > > +out:
> > > +    drm_exec_fini(&job->exec);
> > > +    return ret;
> > > +}
> > > +
> > > +static struct dma_fence *
> > > +nouveau_sched_run_job(struct drm_sched_job *sched_job)
> > > +{
> > > +    struct nouveau_job *job = to_nouveau_job(sched_job);
> > > +
> > > +    return nouveau_job_run(job);
> > > +}
> > > +
> > > +static enum drm_gpu_sched_stat
> > > +nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
> > > +{
> > > +    struct nouveau_job *job = to_nouveau_job(sched_job);
> > > +    struct nouveau_channel *chan = job->chan;
> > > +
> > > +    if (unlikely(!atomic_read(&chan->killed)))
> > > +        nouveau_channel_kill(chan);
> > > +
> > > +    NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
> > > +          chan->chid);
> > > +
> > > +    nouveau_sched_entity_fini(job->entity);
> > > +
> > > +    return DRM_GPU_SCHED_STAT_ENODEV;
> > > +}
> > > +
> > > +static void
> > > +nouveau_sched_free_job(struct drm_sched_job *sched_job)
> > > +{
> > > +    struct nouveau_job *job = to_nouveau_job(sched_job);
> > > +    struct nouveau_sched_entity *entity = job->entity;
> > > +
> > > +    mutex_lock(&entity->job.mutex);
> > > +    list_del(&job->head);
> > > +    mutex_unlock(&entity->job.mutex);
> > > +
> > > +    nouveau_job_fini(job);
> > > +}
> > > +
> > > +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
> > > +                  struct drm_gpu_scheduler *sched)
> > > +{
> > > +
> > > +    INIT_LIST_HEAD(&entity->job.list);
> > > +    mutex_init(&entity->job.mutex);
> > > +
> > > +    return drm_sched_entity_init(&entity->base,
> > > +                     DRM_SCHED_PRIORITY_NORMAL,
> > > +                     &sched, 1, NULL);
> > > +}
> > > +
> > > +void
> > > +nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
> > > +{
> > > +    drm_sched_entity_destroy(&entity->base);
> > > +}
> > > +
> > > +static const struct drm_sched_backend_ops nouveau_sched_ops = {
> > > +    .run_job = nouveau_sched_run_job,
> > > +    .timedout_job = nouveau_sched_timedout_job,
> > > +    .free_job = nouveau_sched_free_job,
> > > +};
> > > +
> > > +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
> > > +               struct nouveau_drm *drm)
> > > +{
> > > +    long job_hang_limit =
> > > msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
> > > +
> > > +    return drm_sched_init(sched, &nouveau_sched_ops,
> > > +                  NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
> > > +                  NULL, NULL, "nouveau", drm->dev->dev);
> > > +}
> > > +
> > > +void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
> > > +{
> > > +    drm_sched_fini(sched);
> > > +}
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > b/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > new file mode 100644
> > > index 000000000000..7fc5b7eea810
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > @@ -0,0 +1,98 @@
> > > +// SPDX-License-Identifier: MIT
> > > +
> > > +#ifndef NOUVEAU_SCHED_H
> > > +#define NOUVEAU_SCHED_H
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +#include <drm/drm_exec.h>
> > > +#include <drm/gpu_scheduler.h>
> > > +
> > > +#include "nouveau_drv.h"
> > > +#include "nouveau_exec.h"
> > > +
> > > +#define to_nouveau_job(sched_job)        \
> > > +        container_of((sched_job), struct nouveau_job, base)
> > > +
> > > +#define to_nouveau_exec_job(job)        \
> > > +        container_of((job), struct nouveau_exec_job, base)
> > > +
> > > +#define to_nouveau_bind_job(job)        \
> > > +        container_of((job), struct nouveau_bind_job, base)
> > > +
> > > +struct nouveau_job {
> > > +    struct drm_sched_job base;
> > > +    struct list_head head;
> > > +
> > > +    struct nouveau_sched_entity *entity;
> > > +
> > > +    struct drm_file *file_priv;
> > > +    struct nouveau_cli *cli;
> > > +    struct nouveau_channel *chan;
> > > +
> > > +    struct drm_exec exec;
> > > +    struct dma_fence *done_fence;
> > > +
> > > +    bool sync;
> > > +
> > > +    struct {
> > > +        struct drm_nouveau_sync *s;
> > > +        u32 count;
> > > +    } in_sync;
> > > +
> > > +    struct {
> > > +        struct drm_nouveau_sync *s;
> > > +        u32 count;
> > > +    } out_sync;
> > > +
> > > +    struct nouveau_job_ops {
> > > +        int (*submit)(struct nouveau_job *);
> > > +        struct dma_fence *(*run)(struct nouveau_job *);
> > > +        void (*free)(struct nouveau_job *);
> > > +    } *ops;
> > > +};
> > > +
> > > +struct nouveau_exec_job {
> > > +    struct nouveau_job base;
> > > +
> > > +    struct {
> > > +        struct drm_nouveau_exec_push *s;
> > > +        u32 count;
> > > +    } push;
> > > +};
> > > +
> > > +struct nouveau_bind_job {
> > > +    struct nouveau_job base;
> > > +
> > > +    /* struct bind_job_op */
> > > +    struct list_head ops;
> > > +};
> > > +
> > > +int nouveau_bind_job_init(struct nouveau_bind_job **job,
> > > +              struct nouveau_exec_bind *bind);
> > > +int nouveau_exec_job_init(struct nouveau_exec_job **job,
> > > +              struct nouveau_exec *exec);
> > > +
> > > +int nouveau_job_submit(struct nouveau_job *job);
> > > +void nouveau_job_fini(struct nouveau_job *job);
> > > +
> > > +#define to_nouveau_sched_entity(entity)        \
> > > +        container_of((entity), struct nouveau_sched_entity, base)
> > > +
> > > +struct nouveau_sched_entity {
> > > +    struct drm_sched_entity base;
> > > +    struct {
> > > +        struct list_head list;
> > > +        struct mutex mutex;
> > > +    } job;
> > > +};
> > > +
> > > +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
> > > +                  struct drm_gpu_scheduler *sched);
> > > +void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
> > > +
> > > +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
> > > +               struct nouveau_drm *drm);
> > > +void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
> > > +
> > > +#endif
> > 
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-19  4:04                   ` Danilo Krummrich
@ 2023-01-19  5:23                     ` Matthew Brost
  2023-01-19 11:33                       ` drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI) Christian König
  2023-02-06 14:48                       ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
  0 siblings, 2 replies; 75+ messages in thread
From: Matthew Brost @ 2023-01-19  5:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Christian König, Dave Airlie, Alex Deucher, jason,
	linux-doc, nouveau, corbet, linux-kernel, dri-devel, bskeggs,
	tzimmermann, airlied

On Thu, Jan 19, 2023 at 05:04:32AM +0100, Danilo Krummrich wrote:
> On 1/18/23 20:48, Christian König wrote:
> > Am 18.01.23 um 20:17 schrieb Dave Airlie:
> > > On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich
> > > > <dakr@redhat.com> wrote:
> > > > > 
> > > > > 
> > > > > On 1/18/23 17:30, Alex Deucher wrote:
> > > > > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich
> > > > > > <dakr@redhat.com> wrote:
> > > > > > > On 1/18/23 16:37, Christian König wrote:
> > > > > > > > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > > > > > > > > Hi Christian,
> > > > > > > > > 
> > > > > > > > > On 1/18/23 09:53, Christian König wrote:
> > > > > > > > > > Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > > > > > > > > > > This patch series provides a new UAPI for the Nouveau driver in
> > > > > > > > > > > order to
> > > > > > > > > > > support Vulkan features, such as
> > > > > > > > > > > sparse bindings and sparse
> > > > > > > > > > > residency.
> > > > > > > > > > > 
> > > > > > > > > > > Furthermore, with the DRM GPUVA
> > > > > > > > > > > manager it provides a new DRM core
> > > > > > > > > > > feature to
> > > > > > > > > > > keep track of GPU virtual address
> > > > > > > > > > > (VA) mappings in a more generic way.
> > > > > > > > > > > 
> > > > > > > > > > > The DRM GPUVA manager is indented to help drivers implement
> > > > > > > > > > > userspace-manageable
> > > > > > > > > > > GPU VA spaces in reference to the Vulkan API. In order to achieve
> > > > > > > > > > > this goal it
> > > > > > > > > > > serves the following purposes in this context.
> > > > > > > > > > > 
> > > > > > > > > > >        1) Provide a dedicated range allocator to track GPU VA
> > > > > > > > > > > allocations and
> > > > > > > > > > >           mappings, making use of the drm_mm range allocator.
> > > > > > > > > > This means that the ranges are allocated
> > > > > > > > > > by the kernel? If yes that's
> > > > > > > > > > a really really bad idea.
> > > > > > > > > No, it's just for keeping track of the
> > > > > > > > > ranges userspace has allocated.
> > > > > > > > Ok, that makes more sense.
> > > > > > > > 
> > > > > > > > So basically you have an IOCTL which asks kernel
> > > > > > > > for a free range? Or
> > > > > > > > what exactly is the drm_mm used for here?
> > > > > > > Not even that, userspace provides both the base
> > > > > > > address and the range,
> > > > > > > the kernel really just keeps track of things.
> > > > > > > Though, writing a UAPI on
> > > > > > > top of the GPUVA manager asking for a free range instead would be
> > > > > > > possible by just adding the corresponding wrapper functions to get a
> > > > > > > free hole.
> > > > > > > 
> > > > > > > Currently, and that's what I think I read out of
> > > > > > > your question, the main
> > > > > > > benefit of using drm_mm over simply stuffing the
> > > > > > > entries into a list or
> > > > > > > something boils down to easier collision detection and iterating
> > > > > > > sub-ranges of the whole VA space.
> > > > > > Why not just do this in userspace?  We have a range manager in
> > > > > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > > > > helper.
> > > > > The kernel still needs to keep track of the mappings within the various
> > > > > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > > > > BOs that get evicted and remap them once they're validated (or swapped
> > > > > back in).
> > > > Ok, you are just using this for maintaining the GPU VM space in
> > > > the kernel.
> > > > 
> > > Yes the idea behind having common code wrapping drm_mm for this is to
> > > allow us to make the rules consistent across drivers.
> > > 
> > > Userspace (generally Vulkan, some compute) has interfaces that pretty
> > > much dictate a lot of how VMA tracking works, esp around lifetimes,
> > > sparse mappings and splitting/merging underlying page tables, I'd
> > > really like this to be more consistent across drivers, because already
> > > I think we've seen with freedreno some divergence from amdgpu and we
> > > also have i915/xe to deal with. I'd like to at least have one place
> > > that we can say this is how it should work, since this is something
> > > that *should* be consistent across drivers mostly, as it is more about
> > > how the uapi is exposed.
> > 
> > That's a really good idea, but the implementation with drm_mm won't work
> > like that.
> > 
> > We have Vulkan applications which use the sparse feature to create
> > literally millions of mappings. That's why I have fine tuned the mapping

Is this not an application issue? Millions of mappings seems a bit
absurd to me.

> > structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle
> > possible in the handling of that.

We might need to bit of work here in Xe as our xe_vma structure is quite
big as we currently use it as dumping ground for various features.

> 
> That's a valuable information. Can you recommend such an application for
> testing / benchmarking?
>

Also interested.
 
> Your optimization effort sounds great. May it be worth thinking about
> generalizing your approach by itself and stacking the drm_gpuva_manager on
> top of it?
>

FWIW the Xe is on board with the drm_gpuva_manager effort, we basically
open code all of this right now. I'd like to port over to
drm_gpuva_manager ASAP so we can contribute and help find a viable
solution for all of us.

Matt
 
> > 
> > A drm_mm_node is more in the range of ~200 bytes and certainly not
> > suitable for this kind of job.
> > 
> > I strongly suggest to rather use a good bunch of the amdgpu VM code as
> > blueprint for the common infrastructure.
> 
> I will definitely have look.
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Dave.
> > 
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-19  4:58       ` Matthew Brost
@ 2023-01-19  7:32         ` Thomas Hellström (Intel)
  2023-01-20 10:08         ` Boris Brezillon
  1 sibling, 0 replies; 75+ messages in thread
From: Thomas Hellström (Intel) @ 2023-01-19  7:32 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc


On 1/19/23 05:58, Matthew Brost wrote:
> On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
>> On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
>>> On 1/18/23 07:12, Danilo Krummrich wrote:
>>>> This commit provides the implementation for the new uapi motivated by the
>>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>>
>>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>>      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
>>>>      space managed by the kernel and userspace, respectively.
>>>>
>>>> 2) Allocate and free a VA space region as well as bind and unbind memory
>>>>      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>>      UMDs can request the named operations to be processed either
>>>>      synchronously or asynchronously. It supports DRM syncobjs
>>>>      (incl. timelines) as synchronization mechanism. The management of the
>>>>      GPU VA mappings is implemented with the DRM GPU VA manager.
>>>>
>>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
>>>>      execution happens asynchronously. It supports DRM syncobj (incl.
>>>>      timelines) as synchronization mechanism. DRM GEM object locking is
>>>>      handled with drm_exec.
>>>>
>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
>>>> GPU scheduler for the asynchronous paths.
>>>>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>    Documentation/gpu/driver-uapi.rst       |   3 +
>>>>    drivers/gpu/drm/nouveau/Kbuild          |   2 +
>>>>    drivers/gpu/drm/nouveau/Kconfig         |   2 +
>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
>>>>    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
>>>>    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
>>>>    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
>>>>    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++++++++++
>>>>    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
>>>>    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++
>>>>    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
>>>>    11 files changed, 1295 insertions(+), 4 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>>>    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>> ...
>>>> +static struct dma_fence *
>>>> +nouveau_bind_job_run(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
>>>> +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
>>>> +    struct bind_job_op *op;
>>>> +    int ret = 0;
>>>> +
>>> I was looking at how nouveau does the async binding compared to how xe
>>> does it.
>>> It looks to me that this function being a scheduler run_job callback is
>>> the main part of the VM_BIND dma-fence signalling critical section for
>>> the job's done_fence and if so, needs to be annotated as such?
>> Yes, that's the case.
>>
>>> For example nouveau_uvma_region_new allocates memory, which is not
>>> allowed if in a dma_fence signalling critical section and the locking
>>> also looks suspicious?
>> Thanks for pointing this out, I missed that somehow.
>>
>> I will change it to pre-allocate new regions, mappings and page tables
>> within the job's submit() function.
>>
> Yea that what we basically do in Xe, in the IOCTL step allocate all the
> backing store for new page tables, populate new page tables (these are
> not yet visible in the page table structure), and in last step which is
> executed after all the dependencies are satified program all the leaf
> entires making the new binding visible.
>
> We screwed have this up by defering most of the IOCTL to a worker but
> will fix this fix this one way or another soon - get rid of worker or
> introduce a type of sync that is signaled after the worker + publish the
> dma-fence in the worker. I'd like to close on this one soon.
>   
>> For the ops structures the drm_gpuva_manager allocates for reporting the
>> split/merge steps back to the driver I have ideas to entirely avoid
>> allocations, which also is a good thing in respect of Christians feedback
>> regarding the huge amount of mapping requests some applications seem to
>> generate.
>>
> It should be fine to have allocations to report the split/merge step as
> this step should be before a dma-fence is published, but yea if possible
> to avoid extra allocs as that is always better.
>
> Also BTW, great work on drm_gpuva_manager too. We will almost likely
> pick this up in Xe rather than open coding all of this as we currently
> do. We should probably start the port to this soon so we can contribute
> to the implementation and get both of our drivers upstream sooner.
>   
>> Regarding the locking, anything specific that makes it look suspicious to
>> you?
>>
> I haven't looked into this too but almost certainly Thomas is suggesting
> that if you allocate memory anywhere under the nouveau_uvmm_lock then
> you can't use this lock in the run_job() callback as this in the
> dma-fencing path.

Yes, that was what looked suspicious to me, although I haven't either 
looked at the code in detail to say for sure.

But starting by annotating this with dma_fence_[begin | 
end]_signalling() would help find all issues with this.

FWIW, by coincidence I  discussed drm-scheduler dma-fence annotation 
with Daniel Vetter yesterday and it appears he has a patch-set to enable 
that, at least for drivers that want to opt-in. We probably should try 
to get that merged and then we'd be able to catch this type of things 
earlier.

Thanks,

Thomas



>
> Matt
>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>> +    nouveau_uvmm_lock(uvmm);
>>>> +    list_for_each_op(op, &bind_job->ops) {
>>>> +        switch (op->op) {
>>>> +        case OP_ALLOC: {
>>>> +            bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
>>>> +
>>>> +            ret = nouveau_uvma_region_new(uvmm,
>>>> +                              op->va.addr,
>>>> +                              op->va.range,
>>>> +                              sparse);
>>>> +            if (ret)
>>>> +                goto out_unlock;
>>>> +            break;
>>>> +        }
>>>> +        case OP_FREE:
>>>> +            ret = nouveau_uvma_region_destroy(uvmm,
>>>> +                              op->va.addr,
>>>> +                              op->va.range);
>>>> +            if (ret)
>>>> +                goto out_unlock;
>>>> +            break;
>>>> +        case OP_MAP:
>>>> +            ret = nouveau_uvmm_sm_map(uvmm,
>>>> +                          op->va.addr, op->va.range,
>>>> +                          op->gem.obj, op->gem.offset,
>>>> +                          op->flags && 0xff);
>>>> +            if (ret)
>>>> +                goto out_unlock;
>>>> +            break;
>>>> +        case OP_UNMAP:
>>>> +            ret = nouveau_uvmm_sm_unmap(uvmm,
>>>> +                            op->va.addr,
>>>> +                            op->va.range);
>>>> +            if (ret)
>>>> +                goto out_unlock;
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +out_unlock:
>>>> +    nouveau_uvmm_unlock(uvmm);
>>>> +    if (ret)
>>>> +        NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
>>>> +    return ERR_PTR(ret);
>>>> +}
>>>> +
>>>> +static void
>>>> +nouveau_bind_job_free(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
>>>> +    struct bind_job_op *op, *next;
>>>> +
>>>> +    list_for_each_op_safe(op, next, &bind_job->ops) {
>>>> +        struct drm_gem_object *obj = op->gem.obj;
>>>> +
>>>> +        if (obj)
>>>> +            drm_gem_object_put(obj);
>>>> +
>>>> +        list_del(&op->entry);
>>>> +        kfree(op);
>>>> +    }
>>>> +
>>>> +    nouveau_base_job_free(job);
>>>> +    kfree(bind_job);
>>>> +}
>>>> +
>>>> +static struct nouveau_job_ops nouveau_bind_job_ops = {
>>>> +    .submit = nouveau_bind_job_submit,
>>>> +    .run = nouveau_bind_job_run,
>>>> +    .free = nouveau_bind_job_free,
>>>> +};
>>>> +
>>>> +static int
>>>> +bind_job_op_from_uop(struct bind_job_op **pop,
>>>> +             struct drm_nouveau_vm_bind_op *uop)
>>>> +{
>>>> +    struct bind_job_op *op;
>>>> +
>>>> +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>>>> +    if (!op)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    op->op = uop->op;
>>>> +    op->flags = uop->flags;
>>>> +    op->va.addr = uop->addr;
>>>> +    op->va.range = uop->range;
>>>> +
>>>> +    if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) {
>>>> +        op->gem.handle = uop->handle;
>>>> +        op->gem.offset = uop->bo_offset;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void
>>>> +bind_job_ops_free(struct list_head *ops)
>>>> +{
>>>> +    struct bind_job_op *op, *next;
>>>> +
>>>> +    list_for_each_op_safe(op, next, ops) {
>>>> +        list_del(&op->entry);
>>>> +        kfree(op);
>>>> +    }
>>>> +}
>>>> +
>>>> +int
>>>> +nouveau_bind_job_init(struct nouveau_bind_job **pjob,
>>>> +              struct nouveau_exec_bind *bind)
>>>> +{
>>>> +    struct nouveau_bind_job *job;
>>>> +    struct bind_job_op *op;
>>>> +    int i, ret;
>>>> +
>>>> +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
>>>> +    if (!job)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    INIT_LIST_HEAD(&job->ops);
>>>> +
>>>> +    for (i = 0; i < bind->op.count; i++) {
>>>> +        ret = bind_job_op_from_uop(&op, &bind->op.s[i]);
>>>> +        if (ret)
>>>> +            goto err_free;
>>>> +
>>>> +        list_add_tail(&op->entry, &job->ops);
>>>> +    }
>>>> +
>>>> +    job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC);
>>>> +    job->base.ops = &nouveau_bind_job_ops;
>>>> +
>>>> +    ret = nouveau_base_job_init(&job->base, &bind->base);
>>>> +    if (ret)
>>>> +        goto err_free;
>>>> +
>>>> +    return 0;
>>>> +
>>>> +err_free:
>>>> +    bind_job_ops_free(&job->ops);
>>>> +    kfree(job);
>>>> +    *pjob = NULL;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +sync_find_fence(struct nouveau_job *job,
>>>> +        struct drm_nouveau_sync *sync,
>>>> +        struct dma_fence **fence)
>>>> +{
>>>> +    u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
>>>> +    u64 point = 0;
>>>> +    int ret;
>>>> +
>>>> +    if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
>>>> +        stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>>>> +        return -EOPNOTSUPP;
>>>> +
>>>> +    if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>>>> +        point = sync->timeline_value;
>>>> +
>>>> +    ret = drm_syncobj_find_fence(job->file_priv,
>>>> +                     sync->handle, point,
>>>> +                     sync->flags, fence);
>>>> +    if (ret)
>>>> +        return ret;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int
>>>> +exec_job_binds_wait(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>>>> +    struct nouveau_cli *cli = exec_job->base.cli;
>>>> +    struct nouveau_sched_entity *bind_entity = &cli->sched_entity;
>>>> +    signed long ret;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < job->in_sync.count; i++) {
>>>> +        struct nouveau_job *it;
>>>> +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
>>>> +        struct dma_fence *fence;
>>>> +        bool found;
>>>> +
>>>> +        ret = sync_find_fence(job, sync, &fence);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +
>>>> +        mutex_lock(&bind_entity->job.mutex);
>>>> +        found = false;
>>>> +        list_for_each_entry(it, &bind_entity->job.list, head) {
>>>> +            if (fence == it->done_fence) {
>>>> +                found = true;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +        mutex_unlock(&bind_entity->job.mutex);
>>>> +
>>>> +        /* If the fence is not from a VM_BIND job, don't wait for it. */
>>>> +        if (!found)
>>>> +            continue;
>>>> +
>>>> +        ret = dma_fence_wait_timeout(fence, true,
>>>> +                         msecs_to_jiffies(500));
>>>> +        if (ret < 0)
>>>> +            return ret;
>>>> +        else if (ret == 0)
>>>> +            return -ETIMEDOUT;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int
>>>> +nouveau_exec_job_submit(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>>>> +    struct nouveau_cli *cli = exec_job->base.cli;
>>>> +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli);
>>>> +    struct drm_exec *exec = &job->exec;
>>>> +    struct drm_gem_object *obj;
>>>> +    unsigned long index;
>>>> +    int ret;
>>>> +
>>>> +    ret = exec_job_binds_wait(job);
>>>> +    if (ret)
>>>> +        return ret;
>>>> +
>>>> +    nouveau_uvmm_lock(uvmm);
>>>> +    drm_exec_while_not_all_locked(exec) {
>>>> +        struct drm_gpuva *va;
>>>> +
>>>> +        drm_gpuva_for_each_va(va, &uvmm->umgr) {
>>>> +            ret = drm_exec_prepare_obj(exec, va->gem.obj, 1);
>>>> +            drm_exec_break_on_contention(exec);
>>>> +            if (ret)
>>>> +                return ret;
>>>> +        }
>>>> +    }
>>>> +    nouveau_uvmm_unlock(uvmm);
>>>> +
>>>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>>>> +        struct dma_resv *resv = obj->resv;
>>>> +        struct nouveau_bo *nvbo = nouveau_gem_object(obj);
>>>> +
>>>> +        ret = nouveau_bo_validate(nvbo, true, false);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +
>>>> +        dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static struct dma_fence *
>>>> +nouveau_exec_job_run(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>>>> +    struct nouveau_fence *fence;
>>>> +    int i, ret;
>>>> +
>>>> +    ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16);
>>>> +    if (ret) {
>>>> +        NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret);
>>>> +        return ERR_PTR(ret);
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < exec_job->push.count; i++) {
>>>> +        nv50_dma_push(job->chan, exec_job->push.s[i].va,
>>>> +                  exec_job->push.s[i].va_len);
>>>> +    }
>>>> +
>>>> +    ret = nouveau_fence_new(job->chan, false, &fence);
>>>> +    if (ret) {
>>>> +        NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret);
>>>> +        WIND_RING(job->chan);
>>>> +        return ERR_PTR(ret);
>>>> +    }
>>>> +
>>>> +    return &fence->base;
>>>> +}
>>>> +static void
>>>> +nouveau_exec_job_free(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job);
>>>> +
>>>> +    nouveau_base_job_free(job);
>>>> +
>>>> +    kfree(exec_job->push.s);
>>>> +    kfree(exec_job);
>>>> +}
>>>> +
>>>> +static struct nouveau_job_ops nouveau_exec_job_ops = {
>>>> +    .submit = nouveau_exec_job_submit,
>>>> +    .run = nouveau_exec_job_run,
>>>> +    .free = nouveau_exec_job_free,
>>>> +};
>>>> +
>>>> +int
>>>> +nouveau_exec_job_init(struct nouveau_exec_job **pjob,
>>>> +              struct nouveau_exec *exec)
>>>> +{
>>>> +    struct nouveau_exec_job *job;
>>>> +    int ret;
>>>> +
>>>> +    job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
>>>> +    if (!job)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    job->push.count = exec->push.count;
>>>> +    job->push.s = kmemdup(exec->push.s,
>>>> +                  sizeof(*exec->push.s) *
>>>> +                  exec->push.count,
>>>> +                  GFP_KERNEL);
>>>> +    if (!job->push.s) {
>>>> +        ret = -ENOMEM;
>>>> +        goto err_free_job;
>>>> +    }
>>>> +
>>>> +    job->base.ops = &nouveau_exec_job_ops;
>>>> +    ret = nouveau_base_job_init(&job->base, &exec->base);
>>>> +    if (ret)
>>>> +        goto err_free_pushs;
>>>> +
>>>> +    return 0;
>>>> +
>>>> +err_free_pushs:
>>>> +    kfree(job->push.s);
>>>> +err_free_job:
>>>> +    kfree(job);
>>>> +    *pjob = NULL;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +void nouveau_job_fini(struct nouveau_job *job)
>>>> +{
>>>> +    dma_fence_put(job->done_fence);
>>>> +    drm_sched_job_cleanup(&job->base);
>>>> +    job->ops->free(job);
>>>> +}
>>>> +
>>>> +static int
>>>> +nouveau_job_add_deps(struct nouveau_job *job)
>>>> +{
>>>> +    struct dma_fence *in_fence = NULL;
>>>> +    int ret, i;
>>>> +
>>>> +    for (i = 0; i < job->in_sync.count; i++) {
>>>> +        struct drm_nouveau_sync *sync = &job->in_sync.s[i];
>>>> +
>>>> +        ret = sync_find_fence(job, sync, &in_fence);
>>>> +        if (ret) {
>>>> +            NV_PRINTK(warn, job->cli,
>>>> +                  "Failed to find syncobj (-> in): handle=%d\n",
>>>> +                  sync->handle);
>>>> +            return ret;
>>>> +        }
>>>> +
>>>> +        ret = drm_sched_job_add_dependency(&job->base, in_fence);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int
>>>> +nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence
>>>> *fence)
>>>> +{
>>>> +    struct drm_syncobj *out_sync;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < job->out_sync.count; i++) {
>>>> +        struct drm_nouveau_sync *sync = &job->out_sync.s[i];
>>>> +        u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK;
>>>> +
>>>> +        if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ &&
>>>> +            stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ)
>>>> +            return -EOPNOTSUPP;
>>>> +
>>>> +        out_sync = drm_syncobj_find(job->file_priv, sync->handle);
>>>> +        if (!out_sync) {
>>>> +            NV_PRINTK(warn, job->cli,
>>>> +                  "Failed to find syncobj (-> out): handle=%d\n",
>>>> +                  sync->handle);
>>>> +            return -ENOENT;
>>>> +        }
>>>> +
>>>> +        if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) {
>>>> +            struct dma_fence_chain *chain;
>>>> +
>>>> +            chain = dma_fence_chain_alloc();
>>>> +            if (!chain) {
>>>> +                drm_syncobj_put(out_sync);
>>>> +                return -ENOMEM;
>>>> +            }
>>>> +
>>>> +            drm_syncobj_add_point(out_sync, chain, fence,
>>>> +                          sync->timeline_value);
>>>> +        } else {
>>>> +            drm_syncobj_replace_fence(out_sync, fence);
>>>> +        }
>>>> +
>>>> +        drm_syncobj_put(out_sync);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static struct dma_fence *
>>>> +nouveau_job_run(struct nouveau_job *job)
>>>> +{
>>>> +    return job->ops->run(job);
>>>> +}
>>>> +
>>>> +static int
>>>> +nouveau_job_run_sync(struct nouveau_job *job)
>>>> +{
>>>> +    struct dma_fence *fence;
>>>> +    int ret;
>>>> +
>>>> +    fence = nouveau_job_run(job);
>>>> +    if (IS_ERR(fence)) {
>>>> +        return PTR_ERR(fence);
>>>> +    } else if (fence) {
>>>> +        ret = dma_fence_wait(fence, true);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    dma_fence_signal(job->done_fence);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int
>>>> +nouveau_job_submit(struct nouveau_job *job)
>>>> +{
>>>> +    struct nouveau_sched_entity *entity =
>>>> to_nouveau_sched_entity(job->base.entity);
>>>> +    int ret;
>>>> +
>>>> +    drm_exec_init(&job->exec, true);
>>>> +
>>>> +    ret = nouveau_job_add_deps(job);
>>>> +    if (ret)
>>>> +        goto out;
>>>> +
>>>> +    drm_sched_job_arm(&job->base);
>>>> +    job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>> +
>>>> +    ret = nouveau_job_fence_attach(job, job->done_fence);
>>>> +    if (ret)
>>>> +        goto out;
>>>> +
>>>> +    if (job->ops->submit) {
>>>> +        ret = job->ops->submit(job);
>>>> +        if (ret)
>>>> +            goto out;
>>>> +    }
>>>> +
>>>> +    if (job->sync) {
>>>> +        drm_exec_fini(&job->exec);
>>>> +
>>>> +        /* We're requested to run a synchronous job, hence don't push
>>>> +         * the job, bypassing the job scheduler, and execute the jobs
>>>> +         * run() function right away.
>>>> +         *
>>>> +         * As a consequence of bypassing the job scheduler we need to
>>>> +         * handle fencing and job cleanup ourselfes.
>>>> +         */
>>>> +        ret = nouveau_job_run_sync(job);
>>>> +
>>>> +        /* If the job fails, the caller will do the cleanup for us. */
>>>> +        if (!ret)
>>>> +            nouveau_job_fini(job);
>>>> +
>>>> +        return ret;
>>>> +    } else {
>>>> +        mutex_lock(&entity->job.mutex);
>>>> +        drm_sched_entity_push_job(&job->base);
>>>> +        list_add_tail(&job->head, &entity->job.list);
>>>> +        mutex_unlock(&entity->job.mutex);
>>>> +    }
>>>> +
>>>> +out:
>>>> +    drm_exec_fini(&job->exec);
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static struct dma_fence *
>>>> +nouveau_sched_run_job(struct drm_sched_job *sched_job)
>>>> +{
>>>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>>>> +
>>>> +    return nouveau_job_run(job);
>>>> +}
>>>> +
>>>> +static enum drm_gpu_sched_stat
>>>> +nouveau_sched_timedout_job(struct drm_sched_job *sched_job)
>>>> +{
>>>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>>>> +    struct nouveau_channel *chan = job->chan;
>>>> +
>>>> +    if (unlikely(!atomic_read(&chan->killed)))
>>>> +        nouveau_channel_kill(chan);
>>>> +
>>>> +    NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n",
>>>> +          chan->chid);
>>>> +
>>>> +    nouveau_sched_entity_fini(job->entity);
>>>> +
>>>> +    return DRM_GPU_SCHED_STAT_ENODEV;
>>>> +}
>>>> +
>>>> +static void
>>>> +nouveau_sched_free_job(struct drm_sched_job *sched_job)
>>>> +{
>>>> +    struct nouveau_job *job = to_nouveau_job(sched_job);
>>>> +    struct nouveau_sched_entity *entity = job->entity;
>>>> +
>>>> +    mutex_lock(&entity->job.mutex);
>>>> +    list_del(&job->head);
>>>> +    mutex_unlock(&entity->job.mutex);
>>>> +
>>>> +    nouveau_job_fini(job);
>>>> +}
>>>> +
>>>> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
>>>> +                  struct drm_gpu_scheduler *sched)
>>>> +{
>>>> +
>>>> +    INIT_LIST_HEAD(&entity->job.list);
>>>> +    mutex_init(&entity->job.mutex);
>>>> +
>>>> +    return drm_sched_entity_init(&entity->base,
>>>> +                     DRM_SCHED_PRIORITY_NORMAL,
>>>> +                     &sched, 1, NULL);
>>>> +}
>>>> +
>>>> +void
>>>> +nouveau_sched_entity_fini(struct nouveau_sched_entity *entity)
>>>> +{
>>>> +    drm_sched_entity_destroy(&entity->base);
>>>> +}
>>>> +
>>>> +static const struct drm_sched_backend_ops nouveau_sched_ops = {
>>>> +    .run_job = nouveau_sched_run_job,
>>>> +    .timedout_job = nouveau_sched_timedout_job,
>>>> +    .free_job = nouveau_sched_free_job,
>>>> +};
>>>> +
>>>> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
>>>> +               struct nouveau_drm *drm)
>>>> +{
>>>> +    long job_hang_limit =
>>>> msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
>>>> +
>>>> +    return drm_sched_init(sched, &nouveau_sched_ops,
>>>> +                  NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
>>>> +                  NULL, NULL, "nouveau", drm->dev->dev);
>>>> +}
>>>> +
>>>> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched)
>>>> +{
>>>> +    drm_sched_fini(sched);
>>>> +}
>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h
>>>> b/drivers/gpu/drm/nouveau/nouveau_sched.h
>>>> new file mode 100644
>>>> index 000000000000..7fc5b7eea810
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
>>>> @@ -0,0 +1,98 @@
>>>> +// SPDX-License-Identifier: MIT
>>>> +
>>>> +#ifndef NOUVEAU_SCHED_H
>>>> +#define NOUVEAU_SCHED_H
>>>> +
>>>> +#include <linux/types.h>
>>>> +
>>>> +#include <drm/drm_exec.h>
>>>> +#include <drm/gpu_scheduler.h>
>>>> +
>>>> +#include "nouveau_drv.h"
>>>> +#include "nouveau_exec.h"
>>>> +
>>>> +#define to_nouveau_job(sched_job)        \
>>>> +        container_of((sched_job), struct nouveau_job, base)
>>>> +
>>>> +#define to_nouveau_exec_job(job)        \
>>>> +        container_of((job), struct nouveau_exec_job, base)
>>>> +
>>>> +#define to_nouveau_bind_job(job)        \
>>>> +        container_of((job), struct nouveau_bind_job, base)
>>>> +
>>>> +struct nouveau_job {
>>>> +    struct drm_sched_job base;
>>>> +    struct list_head head;
>>>> +
>>>> +    struct nouveau_sched_entity *entity;
>>>> +
>>>> +    struct drm_file *file_priv;
>>>> +    struct nouveau_cli *cli;
>>>> +    struct nouveau_channel *chan;
>>>> +
>>>> +    struct drm_exec exec;
>>>> +    struct dma_fence *done_fence;
>>>> +
>>>> +    bool sync;
>>>> +
>>>> +    struct {
>>>> +        struct drm_nouveau_sync *s;
>>>> +        u32 count;
>>>> +    } in_sync;
>>>> +
>>>> +    struct {
>>>> +        struct drm_nouveau_sync *s;
>>>> +        u32 count;
>>>> +    } out_sync;
>>>> +
>>>> +    struct nouveau_job_ops {
>>>> +        int (*submit)(struct nouveau_job *);
>>>> +        struct dma_fence *(*run)(struct nouveau_job *);
>>>> +        void (*free)(struct nouveau_job *);
>>>> +    } *ops;
>>>> +};
>>>> +
>>>> +struct nouveau_exec_job {
>>>> +    struct nouveau_job base;
>>>> +
>>>> +    struct {
>>>> +        struct drm_nouveau_exec_push *s;
>>>> +        u32 count;
>>>> +    } push;
>>>> +};
>>>> +
>>>> +struct nouveau_bind_job {
>>>> +    struct nouveau_job base;
>>>> +
>>>> +    /* struct bind_job_op */
>>>> +    struct list_head ops;
>>>> +};
>>>> +
>>>> +int nouveau_bind_job_init(struct nouveau_bind_job **job,
>>>> +              struct nouveau_exec_bind *bind);
>>>> +int nouveau_exec_job_init(struct nouveau_exec_job **job,
>>>> +              struct nouveau_exec *exec);
>>>> +
>>>> +int nouveau_job_submit(struct nouveau_job *job);
>>>> +void nouveau_job_fini(struct nouveau_job *job);
>>>> +
>>>> +#define to_nouveau_sched_entity(entity)        \
>>>> +        container_of((entity), struct nouveau_sched_entity, base)
>>>> +
>>>> +struct nouveau_sched_entity {
>>>> +    struct drm_sched_entity base;
>>>> +    struct {
>>>> +        struct list_head list;
>>>> +        struct mutex mutex;
>>>> +    } job;
>>>> +};
>>>> +
>>>> +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity,
>>>> +                  struct drm_gpu_scheduler *sched);
>>>> +void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity);
>>>> +
>>>> +int nouveau_sched_init(struct drm_gpu_scheduler *sched,
>>>> +               struct nouveau_drm *drm);
>>>> +void nouveau_sched_fini(struct drm_gpu_scheduler *sched);
>>>> +
>>>> +#endif

^ permalink raw reply	[flat|nested] 75+ messages in thread

* drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI)
  2023-01-19  5:23                     ` Matthew Brost
@ 2023-01-19 11:33                       ` Christian König
  2023-02-06 14:48                       ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
  1 sibling, 0 replies; 75+ messages in thread
From: Christian König @ 2023-01-19 11:33 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich
  Cc: Dave Airlie, Alex Deucher, jason, linux-doc, nouveau, corbet,
	linux-kernel, dri-devel, bskeggs, tzimmermann, airlied

Am 19.01.23 um 06:23 schrieb Matthew Brost:
> [SNIP]
>>>> Userspace (generally Vulkan, some compute) has interfaces that pretty
>>>> much dictate a lot of how VMA tracking works, esp around lifetimes,
>>>> sparse mappings and splitting/merging underlying page tables, I'd
>>>> really like this to be more consistent across drivers, because already
>>>> I think we've seen with freedreno some divergence from amdgpu and we
>>>> also have i915/xe to deal with. I'd like to at least have one place
>>>> that we can say this is how it should work, since this is something
>>>> that *should* be consistent across drivers mostly, as it is more about
>>>> how the uapi is exposed.
>>> That's a really good idea, but the implementation with drm_mm won't work
>>> like that.
>>>
>>> We have Vulkan applications which use the sparse feature to create
>>> literally millions of mappings. That's why I have fine tuned the mapping
> Is this not an application issue? Millions of mappings seems a bit
> absurd to me.

That's unfortunately how some games are designed these days.

>>> structure in amdgpu down to ~80 bytes IIRC and save every CPU cycle
>>> possible in the handling of that.
> We might need to bit of work here in Xe as our xe_vma structure is quite
> big as we currently use it as dumping ground for various features.

We have done that as well and it turned out to be a bad idea. At one 
point we added some power management information into the mapping 
structure, but quickly reverted that.

>> That's a valuable information. Can you recommend such an application for
>> testing / benchmarking?
>>
> Also interested.

On of the most demanding ones is Forza Horizon 5. The general approach 
of that game seems to be to allocate 64GiB of address space (equals 16 
million 4kiB pages) and then mmap() whatever data it needs into that 
self managed space, assuming that every 4KiB page is individually 
mapable to a different location.

>> Your optimization effort sounds great. May it be worth thinking about
>> generalizing your approach by itself and stacking the drm_gpuva_manager on
>> top of it?
>>
> FWIW the Xe is on board with the drm_gpuva_manager effort, we basically
> open code all of this right now. I'd like to port over to
> drm_gpuva_manager ASAP so we can contribute and help find a viable
> solution for all of us.

Sounds good. I haven't looked into the drm_gpuva_manager code yet, but a 
few design notes I've leaned from amdgpu:

Separate address space management (drm_mm) from page table management. 
In other words when an application asks for 64GiB for free address space 
you don't look into the page table structures, but rather into a 
separate drm_mm instance. In amdgpu we even moved the later into 
userspace, but the general take away is that you have only a handful of 
address space requests while you have tons of mapping/unmapping requests.

Separate the tracking structure into two, one for each BO+VM combination 
(we call that amdgpu_bo_va) and one for each mapping (called 
amdgpu_bo_va_mapping). We unfortunately use that for our hw dependent 
state machine as well, so it isn't easily generalize-able.

I've gone back on forth on merging VMA and then not again. Not merging 
them can save quite a bit of overhead, but results in much more mappings 
for some use cases.

Regards,
Christian.

>
> Matt
>   
>>> A drm_mm_node is more in the range of ~200 bytes and certainly not
>>> suitable for this kind of job.
>>>
>>> I strongly suggest to rather use a good bunch of the amdgpu VM code as
>>> blueprint for the common infrastructure.
>> I will definitely have look.
>>
>>> Regards,
>>> Christian.
>>>
>>>> Dave.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  2023-01-18  6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
@ 2023-01-20  3:37   ` kernel test robot
  0 siblings, 0 replies; 75+ messages in thread
From: kernel test robot @ 2023-01-20  3:37 UTC (permalink / raw)
  To: Danilo Krummrich, daniel, airlied, christian.koenig, bskeggs,
	jason, tzimmermann, mripard, corbet
  Cc: oe-kbuild-all, nouveau, Danilo Krummrich, linux-kernel,
	dri-devel, linux-doc

Hi Danilo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 0b45ac1170ea6416bc1d36798414c04870cd356d]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
base:   0b45ac1170ea6416bc1d36798414c04870cd356d
patch link:    https://lore.kernel.org/r/20230118061256.2689-12-dakr%40redhat.com
patch subject: [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
config: arc-randconfig-s051-20230119 (https://download.01.org/0day-ci/archive/20230120/202301201115.THLpCShO-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/5fca471110e52d7c8db10f9ff483134a546174a1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Danilo-Krummrich/drm-execution-context-for-GEM-buffers/20230118-141552
        git checkout 5fca471110e52d7c8db10f9ff483134a546174a1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arc SHELL=/bin/bash drivers/gpu/drm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

sparse warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:413:34: sparse: sparse: non size-preserving integer to pointer cast

vim +413 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c

   406	
   407	static int
   408	nvkm_uvmm_mthd_raw_unmap(struct nvkm_uvmm *uvmm, struct nvif_vmm_raw_v0 *args)
   409	{
   410		struct nvkm_vmm *vmm = uvmm->vmm;
   411		struct nvkm_vma *vma;
   412	
 > 413		vma = (struct nvkm_vma *)args->handle;
   414		if (!vma)
   415			return -EINVAL;
   416	
   417		mutex_lock(&vmm->mutex);
   418		if (vma->busy) {
   419			VMM_DEBUG(vmm, "denied %016llx: %d", vma->addr, vma->busy);
   420			mutex_unlock(&vmm->mutex);
   421			return -ENOENT;
   422		}
   423		vma->sparse = args->sparse;
   424		nvkm_vmm_raw_unmap_locked(vmm, vma);
   425		mutex_unlock(&vmm->mutex);
   426	
   427		args->handle = 0;
   428		kfree(vma);
   429		return 0;
   430	}
   431	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
  2023-01-19  4:58       ` Matthew Brost
  2023-01-19  7:32         ` Thomas Hellström (Intel)
@ 2023-01-20 10:08         ` Boris Brezillon
  1 sibling, 0 replies; 75+ messages in thread
From: Boris Brezillon @ 2023-01-20 10:08 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Danilo Krummrich, dri-devel, corbet, tzimmermann,
	Thomas Hellström (Intel),
	linux-doc, linux-kernel, bskeggs, jason, nouveau, airlied,
	christian.koenig

On Thu, 19 Jan 2023 04:58:48 +0000
Matthew Brost <matthew.brost@intel.com> wrote:

> > For the ops structures the drm_gpuva_manager allocates for reporting the
> > split/merge steps back to the driver I have ideas to entirely avoid
> > allocations, which also is a good thing in respect of Christians feedback
> > regarding the huge amount of mapping requests some applications seem to
> > generate.
> >  
> 
> It should be fine to have allocations to report the split/merge step as
> this step should be before a dma-fence is published, but yea if possible
> to avoid extra allocs as that is always better.
> 
> Also BTW, great work on drm_gpuva_manager too. We will almost likely
> pick this up in Xe rather than open coding all of this as we currently
> do. We should probably start the port to this soon so we can contribute
> to the implementation and get both of our drivers upstream sooner.

Also quite interested in using this drm_gpuva_manager for pancsf, since
I've been open-coding something similar. Didn't have the
gpuva_region concept to make sure VA mapping/unmapping requests don't
don't go outside a pre-reserved region, but it seems to automate some
of the stuff I've been doing quite nicely.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-19  4:14   ` Bagas Sanjaya
@ 2023-01-20 18:32     ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-20 18:32 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: dri-devel, linux-doc

On 1/19/23 05:14, Bagas Sanjaya wrote:
> On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
>> This adds the infrastructure for a manager implementation to keep track
>> of GPU virtual address (VA) mappings.
> 
> "Add infrastructure for ..."
> 
>> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
>> + * provides drivers a the list of operations to be executed in order to unmap
>> + * a range of GPU VA space. The logic behind this functions is way simpler
>> + * though: For all existent mappings enclosed by the given range unmap
>> + * operations are created. For mappings which are only partically located within
>> + * the given range, remap operations are created such that those mappings are
>> + * split up and re-mapped partically.
> 
> "Analogous to ..."
> 
>> + *
>> + * The following paragraph depicts the basic constellations of existent GPU VA
>> + * mappings, a newly requested mapping and the resulting mappings as implemented
>> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
>> + * of those constellations.
>> + *
>> + * ::
>> + *
>> + *	1) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	2) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=m)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=m)
>> + *
>> + *
>> + *	3) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	4) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0  a  1
>> + *	old: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or bo_offset.
>> + *
>> + *
>> + *	5) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  b  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0  b  1  a' 2
>> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	6) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  a  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	7) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----| (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2
>> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	8) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	      0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----| (bo_offset=n+1)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	9) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     b     3
>> + *	req:       |-----------| (bo_offset=m)
>> + *
>> + *	     0  a  1     b     3
>> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	10) Existent mapping is merged.
>> + *	-------------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     a     3
>> + *	req:       |-----------| (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	new: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	11) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----|       (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2  a' 3
>> + *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>> + *
>> + *
>> + *	12) Existent mapping is kept.
>> + *	-----------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----|       (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	13) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	14) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	req: |----------------| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	new: |----------------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	15) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	           1     a     3
>> + *	old:       |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     2
>> + *	req: |-----------|       (bo_offset=m)
>> + *
>> + *	     0     b     2  a' 3
>> + *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>> + *
>> + *
>> + *	16) Existent mappings are merged.
>> + *	---------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------|                        (bo_offset=n)
>> + *
>> + *	                            2     a     3
>> + *	old':                       |-----------| (bo_offset=n+2)
>> + *
>> + *	                1     a     2
>> + *	req:            |-----------|             (bo_offset=n+1)
>> + *
>> + *	                      a
>> + *	new: |----------------------------------| (bo_offset=n)
>> + */
> 
> Factor out lists from the big code block above:
> 
> ---- >8 ----
> 

Thanks for your feedback and the patch, it's highly appreciated.

- Danilo

> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index e665f642689d03..411c0aa80bfa1f 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -129,15 +129,14 @@
>    * the given range, remap operations are created such that those mappings are
>    * split up and re-mapped partically.
>    *
> - * The following paragraph depicts the basic constellations of existent GPU VA
> + * The following diagram depicts the basic relationships of existent GPU VA
>    * mappings, a newly requested mapping and the resulting mappings as implemented
> - * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
> - * of those constellations.
> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover any arbitrary
> + * combinations of these.
>    *
> - * ::
> - *
> - *	1) Existent mapping is kept.
> - *	----------------------------
> + * 1) Existent mapping is kept.
> + *
> + *    ::
>    *
>    *	     0     a     1
>    *	old: |-----------| (bo_offset=n)
> @@ -149,8 +148,9 @@
>    *	new: |-----------| (bo_offset=n)
>    *
>    *
> - *	2) Existent mapping is replaced.
> - *	--------------------------------
> + * 2) Existent mapping is replaced.
> + *
> + *    ::
>    *
>    *	     0     a     1
>    *	old: |-----------| (bo_offset=n)
> @@ -162,8 +162,9 @@
>    *	new: |-----------| (bo_offset=m)
>    *
>    *
> - *	3) Existent mapping is replaced.
> - *	--------------------------------
> + * 3) Existent mapping is replaced.
> + *
> + *    ::
>    *
>    *	     0     a     1
>    *	old: |-----------| (bo_offset=n)
> @@ -175,8 +176,9 @@
>    *	new: |-----------| (bo_offset=n)
>    *
>    *
> - *	4) Existent mapping is replaced.
> - *	--------------------------------
> + * 4) Existent mapping is replaced.
> + *
> + *    ::
>    *
>    *	     0  a  1
>    *	old: |-----|       (bo_offset=n)
> @@ -187,12 +189,14 @@
>    *	     0     a     2
>    *	new: |-----------| (bo_offset=n)
>    *
> - *	Note: We expect to see the same result for a request with a different bo
> - *	      and/or bo_offset.
> + *    .. note::
> + *       We expect to see the same result for a request with a different bo
> + *       and/or bo_offset.
>    *
>    *
> - *	5) Existent mapping is split.
> - *	-----------------------------
> + * 5) Existent mapping is split.
> + *
> + *    ::
>    *
>    *	     0     a     2
>    *	old: |-----------| (bo_offset=n)
> @@ -203,12 +207,14 @@
>    *	     0  b  1  a' 2
>    *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>    *
> - *	Note: We expect to see the same result for a request with a different bo
> - *	      and/or non-contiguous bo_offset.
> + *    .. note::
> + *       We expect to see the same result for a request with a different bo
> + *       and/or non-contiguous bo_offset.
>    *
>    *
> - *	6) Existent mapping is kept.
> - *	----------------------------
> + * 6) Existent mapping is kept.
> + *
> + *    ::
>    *
>    *	     0     a     2
>    *	old: |-----------| (bo_offset=n)
> @@ -220,8 +226,9 @@
>    *	new: |-----------| (bo_offset=n)
>    *
>    *
> - *	7) Existent mapping is split.
> - *	-----------------------------
> + * 7) Existent mapping is split.
> + *
> + *    ::
>    *
>    *	     0     a     2
>    *	old: |-----------| (bo_offset=n)
> @@ -233,8 +240,9 @@
>    *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>    *
>    *
> - *	8) Existent mapping is kept.
> - *	----------------------------
> + * 8) Existent mapping is kept.
> + *
> + *    ::
>    *
>    *	      0     a     2
>    *	old: |-----------| (bo_offset=n)
> @@ -246,8 +254,9 @@
>    *	new: |-----------| (bo_offset=n)
>    *
>    *
> - *	9) Existent mapping is split.
> - *	-----------------------------
> + * 9) Existent mapping is split.
> + *
> + *    ::
>    *
>    *	     0     a     2
>    *	old: |-----------|       (bo_offset=n)
> @@ -259,104 +268,113 @@
>    *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>    *
>    *
> - *	10) Existent mapping is merged.
> - *	-------------------------------
> + * 10) Existent mapping is merged.
>    *
> - *	     0     a     2
> - *	old: |-----------|       (bo_offset=n)
> + *     ::
>    *
> - *	           1     a     3
> - *	req:       |-----------| (bo_offset=n+1)
> + *	      0     a     2
> + *	 old: |-----------|       (bo_offset=n)
>    *
> - *	     0        a        3
> - *	new: |-----------------| (bo_offset=n)
> + *	            1     a     3
> + *	 req:       |-----------| (bo_offset=n+1)
> + *
> + *	      0        a        3
> + *	 new: |-----------------| (bo_offset=n)
>    *
>    *
> - *	11) Existent mapping is split.
> - *	------------------------------
> + * 11) Existent mapping is split.
>    *
> - *	     0        a        3
> - *	old: |-----------------| (bo_offset=n)
> + *     ::
>    *
> - *	           1  b  2
> - *	req:       |-----|       (bo_offset=m)
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
>    *
> - *	     0  a  1  b  2  a' 3
> - *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
> + *	            1  b  2
> + *	 req:       |-----|       (bo_offset=m)
> + *
> + *	      0  a  1  b  2  a' 3
> + *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>    *
>    *
> - *	12) Existent mapping is kept.
> - *	-----------------------------
> + * 12) Existent mapping is kept.
>    *
> - *	     0        a        3
> - *	old: |-----------------| (bo_offset=n)
> + *     ::
>    *
> - *	           1  a  2
> - *	req:       |-----|       (bo_offset=n+1)
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
>    *
> - *	     0        a        3
> - *	old: |-----------------| (bo_offset=n)
> + *	            1  a  2
> + *	 req:       |-----|       (bo_offset=n+1)
> + *
> + *	      0        a        3
> + *	 old: |-----------------| (bo_offset=n)
>    *
>    *
> - *	13) Existent mapping is replaced.
> - *	---------------------------------
> + * 13) Existent mapping is replaced.
>    *
> - *	           1  a  2
> - *	old:       |-----| (bo_offset=n)
> + *     ::
>    *
> - *	     0     a     2
> - *	req: |-----------| (bo_offset=n)
> + *	            1  a  2
> + *	 old:       |-----| (bo_offset=n)
>    *
> - *	     0     a     2
> - *	new: |-----------| (bo_offset=n)
> + *	      0     a     2
> + *	 req: |-----------| (bo_offset=n)
>    *
> - *	Note: We expect to see the same result for a request with a different bo
> - *	      and/or non-contiguous bo_offset.
> + *	      0     a     2
> + *	 new: |-----------| (bo_offset=n)
> + *
> + *     .. note::
> + *        We expect to see the same result for a request with a different bo
> + *        and/or non-contiguous bo_offset.
>    *
>    *
> - *	14) Existent mapping is replaced.
> - *	---------------------------------
> + * 14) Existent mapping is replaced.
>    *
> - *	           1  a  2
> - *	old:       |-----| (bo_offset=n)
> + *     ::
>    *
> - *	     0        a       3
> - *	req: |----------------| (bo_offset=n)
> + *	            1  a  2
> + *	 old:       |-----| (bo_offset=n)
>    *
> - *	     0        a       3
> - *	new: |----------------| (bo_offset=n)
> + *	      0        a       3
> + *	 req: |----------------| (bo_offset=n)
>    *
> - *	Note: We expect to see the same result for a request with a different bo
> - *	      and/or non-contiguous bo_offset.
> + *	      0        a       3
> + *	 new: |----------------| (bo_offset=n)
> + *
> + *     .. note::
> + *        We expect to see the same result for a request with a different bo
> + *        and/or non-contiguous bo_offset.
>    *
>    *
> - *	15) Existent mapping is split.
> - *	------------------------------
> + * 15) Existent mapping is split.
>    *
> - *	           1     a     3
> - *	old:       |-----------| (bo_offset=n)
> + *     ::
>    *
> - *	     0     b     2
> - *	req: |-----------|       (bo_offset=m)
> + *	            1     a     3
> + *	 old:       |-----------| (bo_offset=n)
>    *
> - *	     0     b     2  a' 3
> - *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> + *	      0     b     2
> + *	 req: |-----------|       (bo_offset=m)
> + *
> + *	      0     b     2  a' 3
> + *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>    *
>    *
> - *	16) Existent mappings are merged.
> - *	---------------------------------
> + * 16) Existent mappings are merged.
>    *
> - *	     0     a     1
> - *	old: |-----------|                        (bo_offset=n)
> + *     ::
>    *
> - *	                            2     a     3
> - *	old':                       |-----------| (bo_offset=n+2)
> + *	      0     a     1
> + *	 old: |-----------|                        (bo_offset=n)
>    *
> - *	                1     a     2
> - *	req:            |-----------|             (bo_offset=n+1)
> + *	                             2     a     3
> + *	 old':                       |-----------| (bo_offset=n+2)
>    *
> - *	                      a
> - *	new: |----------------------------------| (bo_offset=n)
> + *	                 1     a     2
> + *	 req:            |-----------|             (bo_offset=n+1)
> + *
> + *	                       a
> + *	 new: |----------------------------------| (bo_offset=n)
>    */
>   
>   /**
> 
> However, the relationship scenario descriptions are too generic (different
> diagrams are described by the same text). Please rewrite them, taking into
> account bo_offset values in each scenario.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
  2023-01-19  4:14   ` Bagas Sanjaya
@ 2023-01-23 23:23   ` Niranjana Vishwanathapura
  2023-01-26 23:43   ` Matthew Brost
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 75+ messages in thread
From: Niranjana Vishwanathapura @ 2023-01-23 23:23 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
>This adds the infrastructure for a manager implementation to keep track
>of GPU virtual address (VA) mappings.
>
>New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>start implementing, allow userspace applications to request multiple and
>arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>intended to serve the following purposes in this context.
>
>1) Provide a dedicated range allocator to track GPU VA allocations and
>   mappings, making use of the drm_mm range allocator.
>
>2) Generically connect GPU VA mappings to their backing buffers, in
>   particular DRM GEM objects.
>
>3) Provide a common implementation to perform more complex mapping
>   operations on the GPU VA space. In particular splitting and merging
>   of GPU VA mappings, e.g. for intersecting mapping requests or partial
>   unmap requests.
>
>Idea-suggested-by: Dave Airlie <airlied@redhat.com>
>Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>---
> Documentation/gpu/drm-mm.rst    |   31 +
> drivers/gpu/drm/Makefile        |    1 +
> drivers/gpu/drm/drm_gem.c       |    3 +
> drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
> include/drm/drm_drv.h           |    6 +
> include/drm/drm_gem.h           |   75 ++
> include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
> 7 files changed, 1966 insertions(+)
> create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> create mode 100644 include/drm/drm_gpuva_mgr.h
>
>diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>index a52e6f4117d6..c9f120cfe730 100644
>--- a/Documentation/gpu/drm-mm.rst
>+++ b/Documentation/gpu/drm-mm.rst
>@@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
> .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>    :export:
>
>+DRM GPU VA Manager
>+==================
>+
>+Overview
>+--------
>+
>+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>+   :doc: Overview
>+
>+Split and Merge
>+---------------
>+
>+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>+   :doc: Split and Merge
>+
>+Locking
>+-------
>+
>+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>+   :doc: Locking
>+
>+
>+DRM GPU VA Manager Function References
>+--------------------------------------
>+
>+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
>+   :internal:
>+
>+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>+   :export:
>+
> DRM Buddy Allocator
> ===================
>
>diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>index 4fe190aee584..de2ffca3b6e4 100644
>--- a/drivers/gpu/drm/Makefile
>+++ b/drivers/gpu/drm/Makefile
>@@ -45,6 +45,7 @@ drm-y := \
> 	drm_vblank.o \
> 	drm_vblank_work.o \
> 	drm_vma_manager.o \
>+	drm_gpuva_mgr.o \
> 	drm_writeback.o
> drm-$(CONFIG_DRM_LEGACY) += \
> 	drm_agpsupport.o \
>diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>index 59a0bb5ebd85..65115fe88627 100644
>--- a/drivers/gpu/drm/drm_gem.c
>+++ b/drivers/gpu/drm/drm_gem.c
>@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
> 	if (!obj->resv)
> 		obj->resv = &obj->_resv;
>
>+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
>+		drm_gem_gpuva_init(obj);
>+
> 	drm_vma_node_reset(&obj->vma_node);
> 	INIT_LIST_HEAD(&obj->lru_node);
> }
>diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
>new file mode 100644
>index 000000000000..e665f642689d
>--- /dev/null
>+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
>@@ -0,0 +1,1323 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (c) 2022 Red Hat.
>+ *
>+ * Permission is hereby granted, free of charge, to any person obtaining a
>+ * copy of this software and associated documentation files (the "Software"),
>+ * to deal in the Software without restriction, including without limitation
>+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>+ * and/or sell copies of the Software, and to permit persons to whom the
>+ * Software is furnished to do so, subject to the following conditions:
>+ *
>+ * The above copyright notice and this permission notice shall be included in
>+ * all copies or substantial portions of the Software.
>+ *
>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>+ * OTHER DEALINGS IN THE SOFTWARE.
>+ *
>+ * Authors:
>+ *     Danilo Krummrich <dakr@redhat.com>
>+ *
>+ */
>+
>+#include <drm/drm_gem.h>
>+#include <drm/drm_gpuva_mgr.h>
>+
>+/**
>+ * DOC: Overview
>+ *
>+ * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
>+ * of a GPU's virtual address (VA) space and manages the corresponding virtual
>+ * mappings represented by &drm_gpuva objects. It also keeps track of the
>+ * mapping's backing &drm_gem_object buffers.
>+ *
>+ * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
>+ * &drm_gpuva objects representing all existent GPU VA mappings using this
>+ * &drm_gem_object as backing buffer.
>+ *
>+ * A GPU VA mapping can only be created within a previously allocated
>+ * &drm_gpuva_region, which represents a reserved portion of the GPU VA space.
>+ * GPU VA mappings are not allowed to span over a &drm_gpuva_region's boundary.
>+ *
>+ * GPU VA regions can also be flagged as sparse, which allows drivers to create
>+ * sparse mappings for a whole GPU VA region in order to support Vulkan
>+ * 'Sparse Resources'.
>+ *

So, is the sparse resources the only usecase for the VA region abstraction in uapi?
Or are there any other potential usecases (other than kernel reserved space)?

>+ * The GPU VA manager internally uses the &drm_mm range allocator to manage the
>+ * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
>+ * space.
>+ *
>+ * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
>+ * the &drm_gpuva_manager contains a special region representing the portion of
>+ * VA space reserved by the kernel. This node is initialized together with the
>+ * GPU VA manager instance and removed when the GPU VA manager is destroyed.
>+ *
>+ * In a typical application drivers would embed struct drm_gpuva_manager,
>+ * struct drm_gpuva_region and struct drm_gpuva within their own driver
>+ * specific structures, there won't be any memory allocations of it's own nor
>+ * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
>+ */
>+
>+/**
>+ * DOC: Split and Merge
>+ *
>+ * The DRM GPU VA manager also provides an algorithm implementing splitting and
>+ * merging of existent GPU VA mappings with the ones that are requested to be
>+ * mapped or unmapped. This feature is required by the Vulkan API to implement
>+ * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
>+ * VM BIND.

Looks like the split and merge is only based on address continuity (virtual and
physical).
How about potential page table attributes specified during VM_BIND for a mapping
such as readonly, caching attributes, atomicity etc?

Thanks,
Niranjana

>+ *
>+ * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list of map, unmap
>+ * and remap operations for a given newly requested mapping. This list
>+ * represents the set of operations to execute in order to integrate the new
>+ * mapping cleanly into the current state of the GPU VA space.
>+ *
>+ * Depending on how the new GPU VA mapping intersects with the existent mappings
>+ * of the GPU VA space the &drm_gpuva_ops contain an arbitrary amount of unmap
>+ * operations, a maximum of two remap operations and a single map operation.
>+ * The set of operations can also be empty if no operation is required, e.g. if
>+ * the requested mapping already exists in the exact same way.
>+ *
>+ * The single map operation, if existent, represents the original map operation
>+ * requested by the caller. Please note that this operation might be altered
>+ * comparing it with the original map operation, e.g. because it was merged with
>+ * an already  existent mapping. Hence, drivers must execute this map operation
>+ * instead of the original one they passed to drm_gpuva_sm_map_ops_create().
>+ *
>+ * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
>+ * &drm_gpuva to unmap is physically contiguous with the original mapping
>+ * request. Optionally, if 'keep' is set, drivers may keep the actual page table
>+ * entries for this &drm_gpuva, adding the missing page table entries only and
>+ * update the &drm_gpuva_manager's view of things accordingly.
>+ *
>+ * Drivers may do the same optimization, namely delta page table updates, also
>+ * for remap operations. This is possible since &drm_gpuva_op_remap consists of
>+ * one unmap operation and one or two map operations, such that drivers can
>+ * derive the page table update delta accordingly.
>+ *
>+ * Note that there can't be more than two existent mappings to split up, one at
>+ * the beginning and one at the end of the new mapping, hence there is a
>+ * maximum of two remap operations.
>+ *
>+ * Generally, the DRM GPU VA manager never merges mappings across the
>+ * boundaries of &drm_gpuva_regions. This is the case since merging between
>+ * GPU VA regions would result into unmap and map operations to be issued for
>+ * both regions involved although the original mapping request was referred to
>+ * one specific GPU VA region only. Since the other GPU VA region, the one not
>+ * explicitly requested to be altered, might be in use by the GPU, we are not
>+ * allowed to issue any map/unmap operations for this region.
>+ *
>+ * Note that before calling drm_gpuva_sm_map_ops_create() again with another
>+ * mapping request it is necessary to update the &drm_gpuva_manager's view of
>+ * the GPU VA space. The previously obtained operations must be either fully
>+ * processed or completely abandoned.
>+ *
>+ * To update the &drm_gpuva_manager's view of the GPU VA space
>+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>+ * drm_gpuva_destroy_unlocked() should be used.
>+ *
>+ * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
>+ * provides drivers a the list of operations to be executed in order to unmap
>+ * a range of GPU VA space. The logic behind this functions is way simpler
>+ * though: For all existent mappings enclosed by the given range unmap
>+ * operations are created. For mappings which are only partically located within
>+ * the given range, remap operations are created such that those mappings are
>+ * split up and re-mapped partically.
>+ *
>+ * The following paragraph depicts the basic constellations of existent GPU VA
>+ * mappings, a newly requested mapping and the resulting mappings as implemented
>+ * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
>+ * of those constellations.
>+ *
>+ * ::
>+ *
>+ *	1) Existent mapping is kept.
>+ *	----------------------------
>+ *
>+ *	     0     a     1
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	     0     a     1
>+ *	req: |-----------| (bo_offset=n)
>+ *
>+ *	     0     a     1
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *
>+ *	2) Existent mapping is replaced.
>+ *	--------------------------------
>+ *
>+ *	     0     a     1
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	     0     a     1
>+ *	req: |-----------| (bo_offset=m)
>+ *
>+ *	     0     a     1
>+ *	new: |-----------| (bo_offset=m)
>+ *
>+ *
>+ *	3) Existent mapping is replaced.
>+ *	--------------------------------
>+ *
>+ *	     0     a     1
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	     0     b     1
>+ *	req: |-----------| (bo_offset=n)
>+ *
>+ *	     0     b     1
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *
>+ *	4) Existent mapping is replaced.
>+ *	--------------------------------
>+ *
>+ *	     0  a  1
>+ *	old: |-----|       (bo_offset=n)
>+ *
>+ *	     0     a     2
>+ *	req: |-----------| (bo_offset=n)
>+ *
>+ *	     0     a     2
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *	Note: We expect to see the same result for a request with a different bo
>+ *	      and/or bo_offset.
>+ *
>+ *
>+ *	5) Existent mapping is split.
>+ *	-----------------------------
>+ *
>+ *	     0     a     2
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	     0  b  1
>+ *	req: |-----|       (bo_offset=n)
>+ *
>+ *	     0  b  1  a' 2
>+ *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>+ *
>+ *	Note: We expect to see the same result for a request with a different bo
>+ *	      and/or non-contiguous bo_offset.
>+ *
>+ *
>+ *	6) Existent mapping is kept.
>+ *	----------------------------
>+ *
>+ *	     0     a     2
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	     0  a  1
>+ *	req: |-----|       (bo_offset=n)
>+ *
>+ *	     0     a     2
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *
>+ *	7) Existent mapping is split.
>+ *	-----------------------------
>+ *
>+ *	     0     a     2
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	           1  b  2
>+ *	req:       |-----| (bo_offset=m)
>+ *
>+ *	     0  a  1  b  2
>+ *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>+ *
>+ *
>+ *	8) Existent mapping is kept.
>+ *	----------------------------
>+ *
>+ *	      0     a     2
>+ *	old: |-----------| (bo_offset=n)
>+ *
>+ *	           1  a  2
>+ *	req:       |-----| (bo_offset=n+1)
>+ *
>+ *	     0     a     2
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *
>+ *	9) Existent mapping is split.
>+ *	-----------------------------
>+ *
>+ *	     0     a     2
>+ *	old: |-----------|       (bo_offset=n)
>+ *
>+ *	           1     b     3
>+ *	req:       |-----------| (bo_offset=m)
>+ *
>+ *	     0  a  1     b     3
>+ *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>+ *
>+ *
>+ *	10) Existent mapping is merged.
>+ *	-------------------------------
>+ *
>+ *	     0     a     2
>+ *	old: |-----------|       (bo_offset=n)
>+ *
>+ *	           1     a     3
>+ *	req:       |-----------| (bo_offset=n+1)
>+ *
>+ *	     0        a        3
>+ *	new: |-----------------| (bo_offset=n)
>+ *
>+ *
>+ *	11) Existent mapping is split.
>+ *	------------------------------
>+ *
>+ *	     0        a        3
>+ *	old: |-----------------| (bo_offset=n)
>+ *
>+ *	           1  b  2
>+ *	req:       |-----|       (bo_offset=m)
>+ *
>+ *	     0  a  1  b  2  a' 3
>+ *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>+ *
>+ *
>+ *	12) Existent mapping is kept.
>+ *	-----------------------------
>+ *
>+ *	     0        a        3
>+ *	old: |-----------------| (bo_offset=n)
>+ *
>+ *	           1  a  2
>+ *	req:       |-----|       (bo_offset=n+1)
>+ *
>+ *	     0        a        3
>+ *	old: |-----------------| (bo_offset=n)
>+ *
>+ *
>+ *	13) Existent mapping is replaced.
>+ *	---------------------------------
>+ *
>+ *	           1  a  2
>+ *	old:       |-----| (bo_offset=n)
>+ *
>+ *	     0     a     2
>+ *	req: |-----------| (bo_offset=n)
>+ *
>+ *	     0     a     2
>+ *	new: |-----------| (bo_offset=n)
>+ *
>+ *	Note: We expect to see the same result for a request with a different bo
>+ *	      and/or non-contiguous bo_offset.
>+ *
>+ *
>+ *	14) Existent mapping is replaced.
>+ *	---------------------------------
>+ *
>+ *	           1  a  2
>+ *	old:       |-----| (bo_offset=n)
>+ *
>+ *	     0        a       3
>+ *	req: |----------------| (bo_offset=n)
>+ *
>+ *	     0        a       3
>+ *	new: |----------------| (bo_offset=n)
>+ *
>+ *	Note: We expect to see the same result for a request with a different bo
>+ *	      and/or non-contiguous bo_offset.
>+ *
>+ *
>+ *	15) Existent mapping is split.
>+ *	------------------------------
>+ *
>+ *	           1     a     3
>+ *	old:       |-----------| (bo_offset=n)
>+ *
>+ *	     0     b     2
>+ *	req: |-----------|       (bo_offset=m)
>+ *
>+ *	     0     b     2  a' 3
>+ *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>+ *
>+ *
>+ *	16) Existent mappings are merged.
>+ *	---------------------------------
>+ *
>+ *	     0     a     1
>+ *	old: |-----------|                        (bo_offset=n)
>+ *
>+ *	                            2     a     3
>+ *	old':                       |-----------| (bo_offset=n+2)
>+ *
>+ *	                1     a     2
>+ *	req:            |-----------|             (bo_offset=n+1)
>+ *
>+ *	                      a
>+ *	new: |----------------------------------| (bo_offset=n)
>+ */
>+
>+/**
>+ * DOC: Locking
>+ *
>+ * Generally, the GPU VA manager does not take care of locking itself, it is
>+ * the drivers responsibility to take care about locking. Drivers might want to
>+ * protect the following operations: inserting, destroying and iterating
>+ * &drm_gpuva and &drm_gpuva_region objects as well as generating split and merge
>+ * operations.
>+ *
>+ * The GPU VA manager does take care of the locking of the backing
>+ * &drm_gem_object buffers GPU VA lists though, unless the provided functions
>+ * documentation claims otherwise.
>+ */
>+
>+/**
>+ * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
>+ * @mgr: pointer to the &drm_gpuva_manager to initialize
>+ * @name: the name of the GPU VA space
>+ * @start_offset: the start offset of the GPU VA space
>+ * @range: the size of the GPU VA space
>+ * @reserve_offset: the start of the kernel reserved GPU VA area
>+ * @reserve_range: the size of the kernel reserved GPU VA area
>+ *
>+ * The &drm_gpuva_manager must be initialized with this function before use.
>+ *
>+ * Note that @mgr must be cleared to 0 before calling this function. The given
>+ * &name is expected to be managed by the surrounding driver structures.
>+ */
>+void
>+drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>+		       const char *name,
>+		       u64 start_offset, u64 range,
>+		       u64 reserve_offset, u64 reserve_range)
>+{
>+	drm_mm_init(&mgr->va_mm, start_offset, range);
>+	drm_mm_init(&mgr->region_mm, start_offset, range);
>+
>+	mgr->mm_start = start_offset;
>+	mgr->mm_range = range;
>+
>+	mgr->name = name ? name : "unknown";
>+
>+	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
>+	mgr->kernel_alloc_node.start = reserve_offset;
>+	mgr->kernel_alloc_node.size = reserve_range;
>+	drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
>+}
>+EXPORT_SYMBOL(drm_gpuva_manager_init);
>+
>+/**
>+ * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
>+ * @mgr: pointer to the &drm_gpuva_manager to clean up
>+ *
>+ * Note that it is a bug to call this function on a manager that still
>+ * holds GPU VA mappings.
>+ */
>+void
>+drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>+{
>+	mgr->name = NULL;
>+	drm_mm_remove_node(&mgr->kernel_alloc_node);
>+	drm_mm_takedown(&mgr->va_mm);
>+	drm_mm_takedown(&mgr->region_mm);
>+}
>+EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>+
>+static struct drm_gpuva_region *
>+drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>+{
>+	struct drm_gpuva_region *reg;
>+
>+	/* Find the VA region the requested range is strictly enclosed by. */
>+	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
>+		if (reg->node.start <= addr &&
>+		    reg->node.start + reg->node.size >= addr + range &&
>+		    &reg->node != &mgr->kernel_alloc_node)
>+			return reg;
>+	}
>+
>+	return NULL;
>+}
>+
>+static bool
>+drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>+{
>+	return !!drm_gpuva_in_region(mgr, addr, range);
>+}
>+
>+/**
>+ * drm_gpuva_insert - insert a &drm_gpuva
>+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>+ * @va: the &drm_gpuva to insert
>+ * @addr: the start address of the GPU VA
>+ * @range: the range of the GPU VA
>+ *
>+ * Insert a &drm_gpuva with a given address and range into a
>+ * &drm_gpuva_manager.
>+ *
>+ * The function assumes the caller does not hold the &drm_gem_object's
>+ * GPU VA list mutex.
>+ *
>+ * Returns: 0 on success, negative error code on failure.
>+ */
>+int
>+drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>+		 struct drm_gpuva *va,
>+		 u64 addr, u64 range)
>+{
>+	struct drm_gpuva_region *reg;
>+	int ret;
>+
>+	if (!va->gem.obj)
>+		return -EINVAL;
>+
>+	reg = drm_gpuva_in_region(mgr, addr, range);
>+	if (!reg)
>+		return -EINVAL;
>+
>+	ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
>+					  range, 0,
>+					  0, addr,
>+					  addr + range,
>+					  DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
>+	if (ret)
>+		return ret;
>+
>+	va->mgr = mgr;
>+	va->region = reg;
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(drm_gpuva_insert);
>+
>+/**
>+ * drm_gpuva_link_locked - link a &drm_gpuva
>+ * @va: the &drm_gpuva to link
>+ *
>+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>+ * associated with.
>+ *
>+ * The function assumes the caller already holds the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_link_locked(struct drm_gpuva *va)
>+{
>+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>+	list_add_tail(&va->head, &va->gem.obj->gpuva.list);
>+}
>+EXPORT_SYMBOL(drm_gpuva_link_locked);
>+
>+/**
>+ * drm_gpuva_link_unlocked - unlink a &drm_gpuva
>+ * @va: the &drm_gpuva to unlink
>+ *
>+ * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>+ * associated with.
>+ *
>+ * The function assumes the caller does not hold the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_link_unlocked(struct drm_gpuva *va)
>+{
>+	drm_gem_gpuva_lock(va->gem.obj);
>+	drm_gpuva_link_locked(va);
>+	drm_gem_gpuva_unlock(va->gem.obj);
>+}
>+EXPORT_SYMBOL(drm_gpuva_link_unlocked);
>+
>+/**
>+ * drm_gpuva_unlink_locked - unlink a &drm_gpuva
>+ * @va: the &drm_gpuva to unlink
>+ *
>+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>+ * associated with.
>+ *
>+ * The function assumes the caller already holds the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_unlink_locked(struct drm_gpuva *va)
>+{
>+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>+	list_del_init(&va->head);
>+}
>+EXPORT_SYMBOL(drm_gpuva_unlink_locked);
>+
>+/**
>+ * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
>+ * @va: the &drm_gpuva to unlink
>+ *
>+ * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>+ * associated with.
>+ *
>+ * The function assumes the caller does not hold the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
>+{
>+	drm_gem_gpuva_lock(va->gem.obj);
>+	drm_gpuva_unlink_locked(va);
>+	drm_gem_gpuva_unlock(va->gem.obj);
>+}
>+EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
>+
>+/**
>+ * drm_gpuva_destroy_locked - destroy a &drm_gpuva
>+ * @va: the &drm_gpuva to destroy
>+ *
>+ * This removes the given &va from GPU VA list of the &drm_gem_object it is
>+ * associated with and removes it from the underlaying range allocator.
>+ *
>+ * The function assumes the caller already holds the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_destroy_locked(struct drm_gpuva *va)
>+{
>+	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>+
>+	list_del(&va->head);
>+	drm_mm_remove_node(&va->node);
>+}
>+EXPORT_SYMBOL(drm_gpuva_destroy_locked);
>+
>+/**
>+ * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
>+ * @va: the &drm_gpuva to destroy
>+ *
>+ * This removes the given &va from GPU VA list of the &drm_gem_object it is
>+ * associated with and removes it from the underlaying range allocator.
>+ *
>+ * The function assumes the caller does not hold the &drm_gem_object's
>+ * GPU VA list mutex.
>+ */
>+void
>+drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
>+{
>+	drm_gem_gpuva_lock(va->gem.obj);
>+	list_del(&va->head);
>+	drm_gem_gpuva_unlock(va->gem.obj);
>+
>+	drm_mm_remove_node(&va->node);
>+}
>+EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
>+
>+/**
>+ * drm_gpuva_find - find a &drm_gpuva
>+ * @mgr: the &drm_gpuva_manager to search in
>+ * @addr: the &drm_gpuvas address
>+ * @range: the &drm_gpuvas range
>+ *
>+ * Returns: the &drm_gpuva at a given &addr and with a given &range
>+ */
>+struct drm_gpuva *
>+drm_gpuva_find(struct drm_gpuva_manager *mgr,
>+	       u64 addr, u64 range)
>+{
>+	struct drm_gpuva *va;
>+
>+	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
>+		if (va->node.start == addr &&
>+		    va->node.size == range)
>+			return va;
>+	}
>+
>+	return NULL;
>+}
>+EXPORT_SYMBOL(drm_gpuva_find);
>+
>+/**
>+ * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>+ * @mgr: the &drm_gpuva_manager to search in
>+ * @start: the given GPU VA's start address
>+ *
>+ * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>+ *
>+ * Note that if there is any free space between the GPU VA mappings no mapping
>+ * is returned.
>+ *
>+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>+ */
>+struct drm_gpuva *
>+drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>+{
>+	struct drm_mm_node *node;
>+
>+	if (start <= mgr->mm_start ||
>+	    start > (mgr->mm_start + mgr->mm_range))
>+		return NULL;
>+
>+	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
>+	if (node == &mgr->va_mm.head_node)
>+		return NULL;
>+
>+	return (struct drm_gpuva *)node;
>+}
>+EXPORT_SYMBOL(drm_gpuva_find_prev);
>+
>+/**
>+ * drm_gpuva_find_next - find the &drm_gpuva after the given address
>+ * @mgr: the &drm_gpuva_manager to search in
>+ * @end: the given GPU VA's end address
>+ *
>+ * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
>+ *
>+ * Note that if there is any free space between the GPU VA mappings no mapping
>+ * is returned.
>+ *
>+ * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>+ */
>+struct drm_gpuva *
>+drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
>+{
>+	struct drm_mm_node *node;
>+
>+	if (end < mgr->mm_start ||
>+	    end >= (mgr->mm_start + mgr->mm_range))
>+		return NULL;
>+
>+	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
>+	if (node == &mgr->va_mm.head_node)
>+		return NULL;
>+
>+	return (struct drm_gpuva *)node;
>+}
>+EXPORT_SYMBOL(drm_gpuva_find_next);
>+
>+/**
>+ * drm_gpuva_region_insert - insert a &drm_gpuva_region
>+ * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>+ * @reg: the &drm_gpuva_region to insert
>+ * @addr: the start address of the GPU VA
>+ * @range: the range of the GPU VA
>+ *
>+ * Insert a &drm_gpuva_region with a given address and range into a
>+ * &drm_gpuva_manager.
>+ *
>+ * Returns: 0 on success, negative error code on failure.
>+ */
>+int
>+drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>+			struct drm_gpuva_region *reg,
>+			u64 addr, u64 range)
>+{
>+	int ret;
>+
>+	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
>+					  range, 0,
>+					  0, addr,
>+					  addr + range,
>+					  DRM_MM_INSERT_LOW|
>+					  DRM_MM_INSERT_ONCE);
>+	if (ret)
>+		return ret;
>+
>+	reg->mgr = mgr;
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(drm_gpuva_region_insert);
>+
>+/**
>+ * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
>+ * @mgr: the &drm_gpuva_manager holding the region
>+ * @reg: the &drm_gpuva to destroy
>+ *
>+ * This removes the given &reg from the underlaying range allocator.
>+ */
>+void
>+drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>+			 struct drm_gpuva_region *reg)
>+{
>+	struct drm_gpuva *va;
>+
>+	drm_gpuva_for_each_va_in_range(va, mgr,
>+				       reg->node.start,
>+				       reg->node.size) {
>+		WARN(1, "GPU VA region must be empty on destroy.\n");
>+		return;
>+	}
>+
>+	if (&reg->node == &mgr->kernel_alloc_node) {
>+		WARN(1, "Can't destroy kernel reserved region.\n");
>+		return;
>+	}
>+
>+	drm_mm_remove_node(&reg->node);
>+}
>+EXPORT_SYMBOL(drm_gpuva_region_destroy);
>+
>+/**
>+ * drm_gpuva_region_find - find a &drm_gpuva_region
>+ * @mgr: the &drm_gpuva_manager to search in
>+ * @addr: the &drm_gpuva_regions address
>+ * @range: the &drm_gpuva_regions range
>+ *
>+ * Returns: the &drm_gpuva_region at a given &addr and with a given &range
>+ */
>+struct drm_gpuva_region *
>+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>+		      u64 addr, u64 range)
>+{
>+	struct drm_gpuva_region *reg;
>+
>+	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
>+		if (reg->node.start == addr &&
>+		    reg->node.size == range)
>+			return reg;
>+
>+	return NULL;
>+}
>+EXPORT_SYMBOL(drm_gpuva_region_find);
>+
>+static int
>+gpuva_op_map_new(struct drm_gpuva_op **pop,
>+		 u64 addr, u64 range,
>+		 struct drm_gem_object *obj, u64 offset)
>+{
>+	struct drm_gpuva_op *op;
>+
>+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>+	if (!op)
>+		return -ENOMEM;
>+
>+	op->op = DRM_GPUVA_OP_MAP;
>+	op->map.va.addr = addr;
>+	op->map.va.range = range;
>+	op->map.gem.obj = obj;
>+	op->map.gem.offset = offset;
>+
>+	return 0;
>+}
>+
>+static int
>+gpuva_op_remap_new(struct drm_gpuva_op **pop,
>+		   struct drm_gpuva_op_map *prev,
>+		   struct drm_gpuva_op_map *next,
>+		   struct drm_gpuva_op_unmap *unmap)
>+{
>+	struct drm_gpuva_op *op;
>+	struct drm_gpuva_op_remap *r;
>+
>+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>+	if (!op)
>+		return -ENOMEM;
>+
>+	op->op = DRM_GPUVA_OP_REMAP;
>+	r = &op->remap;
>+
>+	if (prev) {
>+		r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
>+		if (!r->prev)
>+			goto err_free_op;
>+	}
>+
>+	if (next) {
>+		r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
>+		if (!r->next)
>+			goto err_free_prev;
>+	}
>+
>+	r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
>+	if (!r->unmap)
>+		goto err_free_next;
>+
>+	return 0;
>+
>+err_free_next:
>+	if (next)
>+		kfree(r->next);
>+err_free_prev:
>+	if (prev)
>+		kfree(r->prev);
>+err_free_op:
>+	kfree(op);
>+	*pop = NULL;
>+
>+	return -ENOMEM;
>+}
>+
>+static int
>+gpuva_op_unmap_new(struct drm_gpuva_op **pop,
>+		   struct drm_gpuva *va, bool merge)
>+{
>+	struct drm_gpuva_op *op;
>+
>+	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>+	if (!op)
>+		return -ENOMEM;
>+
>+	op->op = DRM_GPUVA_OP_UNMAP;
>+	op->unmap.va = va;
>+	op->unmap.keep = merge;
>+
>+	return 0;
>+}
>+
>+#define op_map_new_to_list(_ops, _addr, _range,		\
>+			   _obj, _offset)		\
>+do {							\
>+	struct drm_gpuva_op *op;			\
>+							\
>+	ret = gpuva_op_map_new(&op, _addr, _range,	\
>+			       _obj, _offset);		\
>+	if (ret)					\
>+		goto err_free_ops;			\
>+							\
>+	list_add_tail(&op->entry, _ops);		\
>+} while (0)
>+
>+#define op_remap_new_to_list(_ops, _prev, _next,	\
>+			     _unmap)			\
>+do {							\
>+	struct drm_gpuva_op *op;			\
>+							\
>+	ret = gpuva_op_remap_new(&op, _prev, _next,	\
>+				 _unmap);		\
>+	if (ret)					\
>+		goto err_free_ops;			\
>+							\
>+	list_add_tail(&op->entry, _ops);		\
>+} while (0)
>+
>+#define op_unmap_new_to_list(_ops, _gpuva, _merge)	\
>+do {							\
>+	struct drm_gpuva_op *op;			\
>+							\
>+	ret = gpuva_op_unmap_new(&op, _gpuva, _merge);	\
>+	if (ret)					\
>+		goto err_free_ops;			\
>+							\
>+	list_add_tail(&op->entry, _ops);		\
>+} while (0)
>+
>+/**
>+ * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
>+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
>+ * @req_addr: the start address of the new mapping
>+ * @req_range: the range of the new mapping
>+ * @req_obj: the &drm_gem_object to map
>+ * @req_offset: the offset within the &drm_gem_object
>+ *
>+ * This function creates a list of operations to perform splitting and merging
>+ * of existent mapping(s) with the newly requested one.
>+ *
>+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>+ * in the given order. It can contain map, unmap and remap operations, but it
>+ * also can be empty if no operation is required, e.g. if the requested mapping
>+ * already exists is the exact same way.
>+ *
>+ * There can be an arbitrary amount of unmap operations, a maximum of two remap
>+ * operations and a single map operation. The latter one, if existent,
>+ * represents the original map operation requested by the caller. Please note
>+ * that the map operation might has been modified, e.g. if it was
>+ * merged with an existent mapping.
>+ *
>+ * Note that before calling this function again with another mapping request it
>+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>+ * The previously obtained operations must be either processed or abandoned.
>+ * To update the &drm_gpuva_manager's view of the GPU VA space
>+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>+ * drm_gpuva_destroy_unlocked() should be used.
>+ *
>+ * After the caller finished processing the returned &drm_gpuva_ops, they must
>+ * be freed with &drm_gpuva_ops_free.
>+ *
>+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>+ */
>+struct drm_gpuva_ops *
>+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>+			    u64 req_addr, u64 req_range,
>+			    struct drm_gem_object *req_obj, u64 req_offset)
>+{
>+	struct drm_gpuva_ops *ops;
>+	struct drm_gpuva *va, *prev = NULL;
>+	u64 req_end = req_addr + req_range;
>+	bool skip_pmerge = false, skip_nmerge = false;
>+	int ret;
>+
>+	if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
>+		return ERR_PTR(-EINVAL);
>+
>+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>+	if (!ops)
>+		return ERR_PTR(-ENOMEM);
>+
>+	INIT_LIST_HEAD(&ops->list);
>+
>+	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>+		struct drm_gem_object *obj = va->gem.obj;
>+		u64 offset = va->gem.offset;
>+		u64 addr = va->node.start;
>+		u64 range = va->node.size;
>+		u64 end = addr + range;
>+
>+		/* Generally, we want to skip merging with potential mappings
>+		 * left and right of the requested one when we found a
>+		 * collision, since merging happens in this loop already.
>+		 *
>+		 * However, there is one exception when the requested mapping
>+		 * spans into a free VM area. If this is the case we might
>+		 * still hit the boundary of another mapping before and/or
>+		 * after the free VM area.
>+		 */
>+		skip_pmerge = true;
>+		skip_nmerge = true;
>+
>+		if (addr == req_addr) {
>+			bool merge = obj == req_obj &&
>+				     offset == req_offset;
>+			if (end == req_end) {
>+				if (merge)
>+					goto done;
>+
>+				op_unmap_new_to_list(&ops->list, va, false);
>+				break;
>+			}
>+
>+			if (end < req_end) {
>+				skip_nmerge = false;
>+				op_unmap_new_to_list(&ops->list, va, merge);
>+				goto next;
>+			}
>+
>+			if (end > req_end) {
>+				struct drm_gpuva_op_map n = {
>+					.va.addr = req_end,
>+					.va.range = range - req_range,
>+					.gem.obj = obj,
>+					.gem.offset = offset + req_range,
>+				};
>+				struct drm_gpuva_op_unmap u = { .va = va };
>+
>+				if (merge)
>+					goto done;
>+
>+				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>+				break;
>+			}
>+		} else if (addr < req_addr) {
>+			u64 ls_range = req_addr - addr;
>+			struct drm_gpuva_op_map p = {
>+				.va.addr = addr,
>+				.va.range = ls_range,
>+				.gem.obj = obj,
>+				.gem.offset = offset,
>+			};
>+			struct drm_gpuva_op_unmap u = { .va = va };
>+			bool merge = obj == req_obj &&
>+				     offset + ls_range == req_offset;
>+
>+			if (end == req_end) {
>+				if (merge)
>+					goto done;
>+
>+				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>+				break;
>+			}
>+
>+			if (end < req_end) {
>+				u64 new_addr = addr;
>+				u64 new_range = req_range + ls_range;
>+				u64 new_offset = offset;
>+
>+				/* We validated that the requested mapping is
>+				 * within a single VA region already.
>+				 * Since it overlaps the current mapping (which
>+				 * can't cross a VA region boundary) we can be
>+				 * sure that we're still within the boundaries
>+				 * of the same VA region after merging.
>+				 */
>+				if (merge) {
>+					req_offset = new_offset;
>+					req_addr = new_addr;
>+					req_range = new_range;
>+					op_unmap_new_to_list(&ops->list, va, true);
>+					goto next;
>+				}
>+
>+				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>+				goto next;
>+			}
>+
>+			if (end > req_end) {
>+				struct drm_gpuva_op_map n = {
>+					.va.addr = req_end,
>+					.va.range = end - req_end,
>+					.gem.obj = obj,
>+					.gem.offset = offset + ls_range +
>+						      req_range,
>+				};
>+
>+				if (merge)
>+					goto done;
>+
>+				op_remap_new_to_list(&ops->list, &p, &n, &u);
>+				break;
>+			}
>+		} else if (addr > req_addr) {
>+			bool merge = obj == req_obj &&
>+				     offset == req_offset +
>+					       (addr - req_addr);
>+			if (!prev)
>+				skip_pmerge = false;
>+
>+			if (end == req_end) {
>+				op_unmap_new_to_list(&ops->list, va, merge);
>+				break;
>+			}
>+
>+			if (end < req_end) {
>+				skip_nmerge = false;
>+				op_unmap_new_to_list(&ops->list, va, merge);
>+				goto next;
>+			}
>+
>+			if (end > req_end) {
>+				struct drm_gpuva_op_map n = {
>+					.va.addr = req_end,
>+					.va.range = end - req_end,
>+					.gem.obj = obj,
>+					.gem.offset = offset + req_end - addr,
>+				};
>+				struct drm_gpuva_op_unmap u = { .va = va };
>+				u64 new_end = end;
>+				u64 new_range = new_end - req_addr;
>+
>+				/* We validated that the requested mapping is
>+				 * within a single VA region already.
>+				 * Since it overlaps the current mapping (which
>+				 * can't cross a VA region boundary) we can be
>+				 * sure that we're still within the boundaries
>+				 * of the same VA region after merging.
>+				 */
>+				if (merge) {
>+					req_end = new_end;
>+					req_range = new_range;
>+					op_unmap_new_to_list(&ops->list, va, true);
>+					break;
>+				}
>+
>+				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>+				break;
>+			}
>+		}
>+next:
>+		prev = va;
>+	}
>+
>+	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
>+	if (va) {
>+		struct drm_gem_object *obj = va->gem.obj;
>+		u64 offset = va->gem.offset;
>+		u64 addr = va->node.start;
>+		u64 range = va->node.size;
>+		u64 new_offset = offset;
>+		u64 new_addr = addr;
>+		u64 new_range = req_range + range;
>+		bool merge = obj == req_obj &&
>+			     offset + range == req_offset;
>+
>+		/* Don't merge over VA region boundaries. */
>+		merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
>+		if (merge) {
>+			op_unmap_new_to_list(&ops->list, va, true);
>+
>+			req_offset = new_offset;
>+			req_addr = new_addr;
>+			req_range = new_range;
>+		}
>+	}
>+
>+	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
>+	if (va) {
>+		struct drm_gem_object *obj = va->gem.obj;
>+		u64 offset = va->gem.offset;
>+		u64 addr = va->node.start;
>+		u64 range = va->node.size;
>+		u64 end = addr + range;
>+		u64 new_range = req_range + range;
>+		u64 new_end = end;
>+		bool merge = obj == req_obj &&
>+			     offset == req_offset + req_range;
>+
>+		/* Don't merge over VA region boundaries. */
>+		merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
>+		if (merge) {
>+			op_unmap_new_to_list(&ops->list, va, true);
>+
>+			req_range = new_range;
>+			req_end = new_end;
>+		}
>+	}
>+
>+	op_map_new_to_list(&ops->list,
>+			   req_addr, req_range,
>+			   req_obj, req_offset);
>+
>+done:
>+	return ops;
>+
>+err_free_ops:
>+	drm_gpuva_ops_free(ops);
>+	return ERR_PTR(ret);
>+}
>+EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
>+
>+#undef op_map_new_to_list
>+#undef op_remap_new_to_list
>+#undef op_unmap_new_to_list
>+
>+/**
>+ * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
>+ * @mgr: the &drm_gpuva_manager representing the GPU VA space
>+ * @req_addr: the start address of the range to unmap
>+ * @req_range: the range of the mappings to unmap
>+ *
>+ * This function creates a list of operations to perform unmapping and, if
>+ * required, splitting of the mappings overlapping the unmap range.
>+ *
>+ * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>+ * in the given order. It can contain unmap and remap operations, depending on
>+ * whether there are actual overlapping mappings to split.
>+ *
>+ * There can be an arbitrary amount of unmap operations and a maximum of two
>+ * remap operations.
>+ *
>+ * Note that before calling this function again with another range to unmap it
>+ * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>+ * The previously obtained operations must be processed or abandoned.
>+ * To update the &drm_gpuva_manager's view of the GPU VA space
>+ * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>+ * drm_gpuva_destroy_unlocked() should be used.
>+ *
>+ * After the caller finished processing the returned &drm_gpuva_ops, they must
>+ * be freed with &drm_gpuva_ops_free.
>+ *
>+ * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>+ */
>+struct drm_gpuva_ops *
>+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>+			      u64 req_addr, u64 req_range)
>+{
>+	struct drm_gpuva_ops *ops;
>+	struct drm_gpuva_op *op;
>+	struct drm_gpuva_op_remap *r;
>+	struct drm_gpuva *va;
>+	u64 req_end = req_addr + req_range;
>+	int ret;
>+
>+	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>+	if (!ops)
>+		return ERR_PTR(-ENOMEM);
>+
>+	INIT_LIST_HEAD(&ops->list);
>+
>+	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>+		struct drm_gem_object *obj = va->gem.obj;
>+		u64 offset = va->gem.offset;
>+		u64 addr = va->node.start;
>+		u64 range = va->node.size;
>+		u64 end = addr + range;
>+
>+		op = kzalloc(sizeof(*op), GFP_KERNEL);
>+		if (!op) {
>+			ret = -ENOMEM;
>+			goto err_free_ops;
>+		}
>+
>+		r = &op->remap;
>+
>+		if (addr < req_addr) {
>+			r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
>+			if (!r->prev) {
>+				ret = -ENOMEM;
>+				goto err_free_op;
>+			}
>+
>+			r->prev->va.addr = addr;
>+			r->prev->va.range = req_addr - addr;
>+			r->prev->gem.obj = obj;
>+			r->prev->gem.offset = offset;
>+		}
>+
>+		if (end > req_end) {
>+			r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
>+			if (!r->next) {
>+				ret = -ENOMEM;
>+				goto err_free_prev;
>+			}
>+
>+			r->next->va.addr = req_end;
>+			r->next->va.range = end - req_end;
>+			r->next->gem.obj = obj;
>+			r->next->gem.offset = offset + (req_end - addr);
>+		}
>+
>+		if (op->remap.prev || op->remap.next) {
>+			op->op = DRM_GPUVA_OP_REMAP;
>+			r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
>+			if (!r->unmap) {
>+				ret = -ENOMEM;
>+				goto err_free_next;
>+			}
>+
>+			r->unmap->va = va;
>+		} else {
>+			op->op = DRM_GPUVA_OP_UNMAP;
>+			op->unmap.va = va;
>+		}
>+
>+		list_add_tail(&op->entry, &ops->list);
>+	}
>+
>+	return ops;
>+
>+err_free_next:
>+	if (r->next)
>+		kfree(r->next);
>+err_free_prev:
>+	if (r->prev)
>+		kfree(r->prev);
>+err_free_op:
>+	kfree(op);
>+err_free_ops:
>+	drm_gpuva_ops_free(ops);
>+	return ERR_PTR(ret);
>+}
>+EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
>+
>+/**
>+ * drm_gpuva_ops_free - free the given &drm_gpuva_ops
>+ * @ops: the &drm_gpuva_ops to free
>+ *
>+ * Frees the given &drm_gpuva_ops structure including all the ops associated
>+ * with it.
>+ */
>+void
>+drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
>+{
>+	struct drm_gpuva_op *op, *next;
>+
>+	drm_gpuva_for_each_op_safe(op, next, ops) {
>+		list_del(&op->entry);
>+		if (op->op == DRM_GPUVA_OP_REMAP) {
>+			if (op->remap.prev)
>+				kfree(op->remap.prev);
>+
>+			if (op->remap.next)
>+				kfree(op->remap.next);
>+
>+			kfree(op->remap.unmap);
>+		}
>+		kfree(op);
>+	}
>+
>+	kfree(ops);
>+}
>+EXPORT_SYMBOL(drm_gpuva_ops_free);
>diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>index d7c521e8860f..6feacd93aca6 100644
>--- a/include/drm/drm_drv.h
>+++ b/include/drm/drm_drv.h
>@@ -104,6 +104,12 @@ enum drm_driver_feature {
> 	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
> 	 */
> 	DRIVER_COMPUTE_ACCEL            = BIT(7),
>+	/**
>+	 * @DRIVER_GEM_GPUVA:
>+	 *
>+	 * Driver supports user defined GPU VA bindings for GEM objects.
>+	 */
>+	DRIVER_GEM_GPUVA		= BIT(8),
>
> 	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
>
>diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>index 772a4adf5287..4a3679034966 100644
>--- a/include/drm/drm_gem.h
>+++ b/include/drm/drm_gem.h
>@@ -36,6 +36,8 @@
>
> #include <linux/kref.h>
> #include <linux/dma-resv.h>
>+#include <linux/list.h>
>+#include <linux/mutex.h>
>
> #include <drm/drm_vma_manager.h>
>
>@@ -337,6 +339,17 @@ struct drm_gem_object {
> 	 */
> 	struct dma_resv _resv;
>
>+	/**
>+	 * @gpuva:
>+	 *
>+	 * Provides the list and list mutex of GPU VAs attached to this
>+	 * GEM object.
>+	 */
>+	struct {
>+		struct list_head list;
>+		struct mutex mutex;
>+	} gpuva;
>+
> 	/**
> 	 * @funcs:
> 	 *
>@@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
> unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
> 			       bool (*shrink)(struct drm_gem_object *obj));
>
>+/**
>+ * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
>+ * @obj: the &drm_gem_object
>+ *
>+ * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
>+ * protecting it.
>+ *
>+ * Calling this function is only necessary for drivers intending to support the
>+ * &drm_driver_feature DRIVER_GEM_GPUVA.
>+ */
>+static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>+{
>+	INIT_LIST_HEAD(&obj->gpuva.list);
>+	mutex_init(&obj->gpuva.mutex);
>+}
>+
>+/**
>+ * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
>+ * @obj: the &drm_gem_object
>+ *
>+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>+ */
>+static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
>+{
>+	mutex_lock(&obj->gpuva.mutex);
>+}
>+
>+/**
>+ * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
>+ * @obj: the &drm_gem_object
>+ *
>+ * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>+ */
>+static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
>+{
>+	mutex_unlock(&obj->gpuva.mutex);
>+}
>+
>+/**
>+ * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
>+ * @entry: &drm_gpuva structure to assign to in each iteration step
>+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>+ *
>+ * This iterator walks over all &drm_gpuva structures associated with the
>+ * &drm_gpuva_manager.
>+ */
>+#define drm_gem_for_each_gpuva(entry, obj) \
>+	list_for_each_entry(entry, &obj->gpuva.list, head)
>+
>+/**
>+ * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
>+ * @entry: &drm_gpuva structure to assign to in each iteration step
>+ * @next: &next &drm_gpuva to store the next step
>+ * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>+ *
>+ * This iterator walks over all &drm_gpuva structures associated with the
>+ * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
>+ * it is save against removal of elements.
>+ */
>+#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
>+	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
>+
> #endif /* __DRM_GEM_H__ */
>diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
>new file mode 100644
>index 000000000000..adeb0c916e91
>--- /dev/null
>+++ b/include/drm/drm_gpuva_mgr.h
>@@ -0,0 +1,527 @@
>+// SPDX-License-Identifier: GPL-2.0
>+
>+#ifndef __DRM_GPUVA_MGR_H__
>+#define __DRM_GPUVA_MGR_H__
>+
>+/*
>+ * Copyright (c) 2022 Red Hat.
>+ *
>+ * Permission is hereby granted, free of charge, to any person obtaining a
>+ * copy of this software and associated documentation files (the "Software"),
>+ * to deal in the Software without restriction, including without limitation
>+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>+ * and/or sell copies of the Software, and to permit persons to whom the
>+ * Software is furnished to do so, subject to the following conditions:
>+ *
>+ * The above copyright notice and this permission notice shall be included in
>+ * all copies or substantial portions of the Software.
>+ *
>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>+ * OTHER DEALINGS IN THE SOFTWARE.
>+ */
>+
>+#include <drm/drm_mm.h>
>+#include <linux/mm.h>
>+#include <linux/rbtree.h>
>+#include <linux/spinlock.h>
>+#include <linux/types.h>
>+
>+struct drm_gpuva_region;
>+struct drm_gpuva;
>+struct drm_gpuva_ops;
>+
>+/**
>+ * struct drm_gpuva_manager - DRM GPU VA Manager
>+ *
>+ * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
>+ * the &drm_mm range allocator. Typically, this structure is embedded in bigger
>+ * driver structures.
>+ *
>+ * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
>+ * pages.
>+ *
>+ * There should be one manager instance per GPU virtual address space.
>+ */
>+struct drm_gpuva_manager {
>+	/**
>+	 * @name: the name of the DRM GPU VA space
>+	 */
>+	const char *name;
>+
>+	/**
>+	 * @mm_start: start of the VA space
>+	 */
>+	u64 mm_start;
>+
>+	/**
>+	 * @mm_range: length of the VA space
>+	 */
>+	u64 mm_range;
>+
>+	/**
>+	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
>+	 */
>+	struct drm_mm region_mm;
>+
>+	/**
>+	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
>+	 */
>+	struct drm_mm va_mm;
>+
>+	/**
>+	 * @kernel_alloc_node:
>+	 *
>+	 * &drm_mm_node representing the address space cutout reserved for
>+	 * the kernel
>+	 */
>+	struct drm_mm_node kernel_alloc_node;
>+};
>+
>+void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>+			    const char *name,
>+			    u64 start_offset, u64 range,
>+			    u64 reserve_offset, u64 reserve_range);
>+void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
>+
>+/**
>+ * struct drm_gpuva_region - structure to track a portion of GPU VA space
>+ *
>+ * This structure represents a portion of a GPUs VA space and is associated
>+ * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>+ *
>+ * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
>+ * placed within a &drm_gpuva_region.
>+ */
>+struct drm_gpuva_region {
>+	/**
>+	 * @node: the &drm_mm_node to track the GPU VA region
>+	 */
>+	struct drm_mm_node node;
>+
>+	/**
>+	 * @mgr: the &drm_gpuva_manager this object is associated with
>+	 */
>+	struct drm_gpuva_manager *mgr;
>+
>+	/**
>+	 * @sparse: indicates whether this region is sparse
>+	 */
>+	bool sparse;
>+};
>+
>+struct drm_gpuva_region *
>+drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>+		      u64 addr, u64 range);
>+int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>+			    struct drm_gpuva_region *reg,
>+			    u64 addr, u64 range);
>+void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>+			      struct drm_gpuva_region *reg);
>+
>+int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>+		     struct drm_gpuva *va,
>+		     u64 addr, u64 range);
>+/**
>+ * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
>+ * @node__: &drm_gpuva_region structure to assign to in each iteration step
>+ * @gpuva__: &drm_gpuva_manager structure to walk
>+ * @start__: starting offset, the first node will overlap this
>+ * @end__: ending offset, the last node will start before this (but may overlap)
>+ *
>+ * This iterator walks over all nodes in the range allocator that lie
>+ * between @start and @end. It is implemented similarly to list_for_each(),
>+ * but is using &drm_mm's internal interval tree to accelerate the search for
>+ * the starting node, and hence isn't safe against removal of elements. It
>+ * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>+ * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>+ * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>+ * backing &drm_mm, and may even continue indefinitely.
>+ */
>+#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
>+	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
>+									 (start__), (end__)-1); \
>+	     node__->node.start < (end__); \
>+	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
>+
>+/**
>+ * drm_gpuva_for_each_region - iternator to walk over a range of nodes
>+ * @entry: &drm_gpuva_region structure to assign to in each iteration step
>+ * @gpuva: &drm_gpuva_manager structure to walk
>+ *
>+ * This iterator walks over all &drm_gpuva_region structures associated with the
>+ * &drm_gpuva_manager.
>+ */
>+#define drm_gpuva_for_each_region(entry, gpuva) \
>+	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>+
>+/**
>+ * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
>+ * nodes
>+ * @entry: &drm_gpuva_region structure to assign to in each iteration step
>+ * @next: &next &drm_gpuva_region to store the next step
>+ * @gpuva: &drm_gpuva_manager structure to walk
>+ *
>+ * This iterator walks over all &drm_gpuva_region structures associated with the
>+ * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>+ * against removal of elements.
>+ */
>+#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
>+	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>+
>+
>+/**
>+ * enum drm_gpuva_flags - flags for struct drm_gpuva
>+ */
>+enum drm_gpuva_flags {
>+	/**
>+	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
>+	 */
>+	DRM_GPUVA_SWAPPED = (1 << 0),
>+};
>+
>+/**
>+ * struct drm_gpuva - structure to track a GPU VA mapping
>+ *
>+ * This structure represents a GPU VA mapping and is associated with a
>+ * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>+ *
>+ * Typically, this structure is embedded in bigger driver structures.
>+ */
>+struct drm_gpuva {
>+	/**
>+	 * @node: the &drm_mm_node to track the GPU VA mapping
>+	 */
>+	struct drm_mm_node node;
>+
>+	/**
>+	 * @mgr: the &drm_gpuva_manager this object is associated with
>+	 */
>+	struct drm_gpuva_manager *mgr;
>+
>+	/**
>+	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
>+	 */
>+	struct drm_gpuva_region *region;
>+
>+	/**
>+	 * @head: the &list_head to attach this object to a &drm_gem_object
>+	 */
>+	struct list_head head;
>+
>+	/**
>+	 * @flags: the &drm_gpuva_flags for this mapping
>+	 */
>+	enum drm_gpuva_flags flags;
>+
>+	/**
>+	 * @gem: structure containing the &drm_gem_object and it's offset
>+	 */
>+	struct {
>+		/**
>+		 * @offset: the offset within the &drm_gem_object
>+		 */
>+		u64 offset;
>+
>+		/**
>+		 * @obj: the mapped &drm_gem_object
>+		 */
>+		struct drm_gem_object *obj;
>+	} gem;
>+};
>+
>+void drm_gpuva_link_locked(struct drm_gpuva *va);
>+void drm_gpuva_link_unlocked(struct drm_gpuva *va);
>+void drm_gpuva_unlink_locked(struct drm_gpuva *va);
>+void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
>+
>+void drm_gpuva_destroy_locked(struct drm_gpuva *va);
>+void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
>+
>+struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
>+				 u64 addr, u64 range);
>+struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
>+struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
>+
>+/**
>+ * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
>+ * @va: the &drm_gpuva to set the swap flag of
>+ * @swap: indicates whether the &drm_gpuva is swapped
>+ */
>+static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
>+{
>+	if (swap)
>+		va->flags |= DRM_GPUVA_SWAPPED;
>+	else
>+		va->flags &= ~DRM_GPUVA_SWAPPED;
>+}
>+
>+/**
>+ * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
>+ * is swapped
>+ * @va: the &drm_gpuva to check
>+ */
>+static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
>+{
>+	return va->flags & DRM_GPUVA_SWAPPED;
>+}
>+
>+/**
>+ * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
>+ * @node__: &drm_gpuva structure to assign to in each iteration step
>+ * @gpuva__: &drm_gpuva_manager structure to walk
>+ * @start__: starting offset, the first node will overlap this
>+ * @end__: ending offset, the last node will start before this (but may overlap)
>+ *
>+ * This iterator walks over all nodes in the range allocator that lie
>+ * between @start and @end. It is implemented similarly to list_for_each(),
>+ * but is using &drm_mm's internal interval tree to accelerate the search for
>+ * the starting node, and hence isn't safe against removal of elements. It
>+ * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>+ * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>+ * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>+ * backing &drm_mm, and may even continue indefinitely.
>+ */
>+#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
>+	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
>+								  (start__), (end__)-1); \
>+	     node__->node.start < (end__); \
>+	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
>+
>+/**
>+ * drm_gpuva_for_each_va - iternator to walk over a range of nodes
>+ * @entry: &drm_gpuva structure to assign to in each iteration step
>+ * @gpuva: &drm_gpuva_manager structure to walk
>+ *
>+ * This iterator walks over all &drm_gpuva structures associated with the
>+ * &drm_gpuva_manager.
>+ */
>+#define drm_gpuva_for_each_va(entry, gpuva) \
>+	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>+
>+/**
>+ * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
>+ * nodes
>+ * @entry: &drm_gpuva structure to assign to in each iteration step
>+ * @next: &next &drm_gpuva to store the next step
>+ * @gpuva: &drm_gpuva_manager structure to walk
>+ *
>+ * This iterator walks over all &drm_gpuva structures associated with the
>+ * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>+ * against removal of elements.
>+ */
>+#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
>+	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>+
>+/**
>+ * enum drm_gpuva_op_type - GPU VA operation type
>+ *
>+ * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
>+ * can be map, remap or unmap operations.
>+ */
>+enum drm_gpuva_op_type {
>+	/**
>+	 * @DRM_GPUVA_OP_MAP: the map op type
>+	 */
>+	DRM_GPUVA_OP_MAP,
>+
>+	/**
>+	 * @DRM_GPUVA_OP_REMAP: the remap op type
>+	 */
>+	DRM_GPUVA_OP_REMAP,
>+
>+	/**
>+	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
>+	 */
>+	DRM_GPUVA_OP_UNMAP,
>+};
>+
>+/**
>+ * struct drm_gpuva_op_map - GPU VA map operation
>+ *
>+ * This structure represents a single map operation generated by the
>+ * DRM GPU VA manager.
>+ */
>+struct drm_gpuva_op_map {
>+	/**
>+	 * @va: structure containing address and range of a map
>+	 * operation
>+	 */
>+	struct {
>+		/**
>+		 * @addr: the base address of the new mapping
>+		 */
>+		u64 addr;
>+
>+		/**
>+		 * @range: the range of the new mapping
>+		 */
>+		u64 range;
>+	} va;
>+
>+	/**
>+	 * @gem: structure containing the &drm_gem_object and it's offset
>+	 */
>+	struct {
>+		/**
>+		 * @offset: the offset within the &drm_gem_object
>+		 */
>+		u64 offset;
>+
>+		/**
>+		 * @obj: the &drm_gem_object to map
>+		 */
>+		struct drm_gem_object *obj;
>+	} gem;
>+};
>+
>+/**
>+ * struct drm_gpuva_op_unmap - GPU VA unmap operation
>+ *
>+ * This structure represents a single unmap operation generated by the
>+ * DRM GPU VA manager.
>+ */
>+struct drm_gpuva_op_unmap {
>+	/**
>+	 * @va: the &drm_gpuva to unmap
>+	 */
>+	struct drm_gpuva *va;
>+
>+	/**
>+	 * @keep:
>+	 *
>+	 * Indicates whether this &drm_gpuva is physically contiguous with the
>+	 * original mapping request.
>+	 *
>+	 * Optionally, if &keep is set, drivers may keep the actual page table
>+	 * mappings for this &drm_gpuva, adding the missing page table entries
>+	 * only and update the &drm_gpuva_manager accordingly.
>+	 */
>+	bool keep;
>+};
>+
>+/**
>+ * struct drm_gpuva_op_remap - GPU VA remap operation
>+ *
>+ * This represents a single remap operation generated by the DRM GPU VA manager.
>+ *
>+ * A remap operation is generated when an existing GPU VA mmapping is split up
>+ * by inserting a new GPU VA mapping or by partially unmapping existent
>+ * mapping(s), hence it consists of a maximum of two map and one unmap
>+ * operation.
>+ *
>+ * The @unmap operation takes care of removing the original existing mapping.
>+ * @prev is used to remap the preceding part, @next the subsequent part.
>+ *
>+ * If either a new mapping's start address is aligned with the start address
>+ * of the old mapping or the new mapping's end address is aligned with the
>+ * end address of the old mapping, either @prev or @next is NULL.
>+ *
>+ * Note, the reason for a dedicated remap operation, rather than arbitrary
>+ * unmap and map operations, is to give drivers the chance of extracting driver
>+ * specific data for creating the new mappings from the unmap operations's
>+ * &drm_gpuva structure which typically is embedded in larger driver specific
>+ * structures.
>+ */
>+struct drm_gpuva_op_remap {
>+	/**
>+	 * @prev: the preceding part of a split mapping
>+	 */
>+	struct drm_gpuva_op_map *prev;
>+
>+	/**
>+	 * @next: the subsequent part of a split mapping
>+	 */
>+	struct drm_gpuva_op_map *next;
>+
>+	/**
>+	 * @unmap: the unmap operation for the original existing mapping
>+	 */
>+	struct drm_gpuva_op_unmap *unmap;
>+};
>+
>+/**
>+ * struct drm_gpuva_op - GPU VA operation
>+ *
>+ * This structure represents a single generic operation, which can be either
>+ * map, unmap or remap.
>+ *
>+ * The particular type of the operation is defined by @op.
>+ */
>+struct drm_gpuva_op {
>+	/**
>+	 * @entry:
>+	 *
>+	 * The &list_head used to distribute instances of this struct within
>+	 * &drm_gpuva_ops.
>+	 */
>+	struct list_head entry;
>+
>+	/**
>+	 * @op: the type of the operation
>+	 */
>+	enum drm_gpuva_op_type op;
>+
>+	union {
>+		/**
>+		 * @map: the map operation
>+		 */
>+		struct drm_gpuva_op_map map;
>+
>+		/**
>+		 * @unmap: the unmap operation
>+		 */
>+		struct drm_gpuva_op_unmap unmap;
>+
>+		/**
>+		 * @remap: the remap operation
>+		 */
>+		struct drm_gpuva_op_remap remap;
>+	};
>+};
>+
>+/**
>+ * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
>+ */
>+struct drm_gpuva_ops {
>+	/**
>+	 * @list: the &list_head
>+	 */
>+	struct list_head list;
>+};
>+
>+/**
>+ * drm_gpuva_for_each_op - iterator to walk over all ops
>+ * @op: &drm_gpuva_op to assign in each iteration step
>+ * @ops: &drm_gpuva_ops to walk
>+ *
>+ * This iterator walks over all ops within a given list of operations.
>+ */
>+#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
>+
>+/**
>+ * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
>+ * @op: &drm_gpuva_op to assign in each iteration step
>+ * @next: &next &drm_gpuva_op to store the next step
>+ * @ops: &drm_gpuva_ops to walk
>+ *
>+ * This iterator walks over all ops within a given list of operations. It is
>+ * implemented with list_for_each_safe(), so save against removal of elements.
>+ */
>+#define drm_gpuva_for_each_op_safe(op, next, ops) \
>+	list_for_each_entry_safe(op, next, &(ops)->list, entry)
>+
>+struct drm_gpuva_ops *
>+drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>+			    u64 addr, u64 range,
>+			    struct drm_gem_object *obj, u64 offset);
>+struct drm_gpuva_ops *
>+drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>+			      u64 addr, u64 range);
>+void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
>+
>+#endif /* __DRM_GPUVA_MGR_H__ */
>-- 
>2.39.0
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
  2023-01-19  4:14   ` Bagas Sanjaya
  2023-01-23 23:23   ` Niranjana Vishwanathapura
@ 2023-01-26 23:43   ` Matthew Brost
  2023-01-27  0:24   ` Matthew Brost
  2023-02-03 17:37   ` Matthew Brost
  4 siblings, 0 replies; 75+ messages in thread
From: Matthew Brost @ 2023-01-26 23:43 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
> This adds the infrastructure for a manager implementation to keep track
> of GPU virtual address (VA) mappings.
> 
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
> 
> 1) Provide a dedicated range allocator to track GPU VA allocations and
>    mappings, making use of the drm_mm range allocator.
> 
> 2) Generically connect GPU VA mappings to their backing buffers, in
>    particular DRM GEM objects.
> 
> 3) Provide a common implementation to perform more complex mapping
>    operations on the GPU VA space. In particular splitting and merging
>    of GPU VA mappings, e.g. for intersecting mapping requests or partial
>    unmap requests.
> 
> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>

<snip>

> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> new file mode 100644
> index 000000000000..adeb0c916e91
> --- /dev/null
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -0,0 +1,527 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#ifndef __DRM_GPUVA_MGR_H__
> +#define __DRM_GPUVA_MGR_H__
> +
> +/*
> + * Copyright (c) 2022 Red Hat.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <drm/drm_mm.h>
> +#include <linux/mm.h>
> +#include <linux/rbtree.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +struct drm_gpuva_region;
> +struct drm_gpuva;
> +struct drm_gpuva_ops;
> +
> +/**
> + * struct drm_gpuva_manager - DRM GPU VA Manager
> + *
> + * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
> + * the &drm_mm range allocator. Typically, this structure is embedded in bigger
> + * driver structures.
> + *
> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
> + * pages.
> + *
> + * There should be one manager instance per GPU virtual address space.
> + */
> +struct drm_gpuva_manager {
> +	/**
> +	 * @name: the name of the DRM GPU VA space
> +	 */
> +	const char *name;
> +
> +	/**
> +	 * @mm_start: start of the VA space
> +	 */
> +	u64 mm_start;
> +
> +	/**
> +	 * @mm_range: length of the VA space
> +	 */
> +	u64 mm_range;
> +
> +	/**
> +	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
> +	 */
> +	struct drm_mm region_mm;
> +

I'd suggest using a rb_tree rather than drm_mm, it should be quite a bit
more light weight - that is what we currently use in Xe for VM / VMA
management.

See lines 994-1056 in the following file:
https://cgit.freedesktop.org/drm/drm-xe/tree/drivers/gpu/drm/xe/xe_vm.c?h=drm-xe-next

I'm pretty sure all of your magic marcos (drm_gpuva_for_each*) should be
easily implemented using a rb_tree too.

Matt

> +	/**
> +	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
> +	 */
> +	struct drm_mm va_mm;
> +
> +	/**
> +	 * @kernel_alloc_node:
> +	 *
> +	 * &drm_mm_node representing the address space cutout reserved for
> +	 * the kernel
> +	 */
> +	struct drm_mm_node kernel_alloc_node;
> +};
> +
> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +			    const char *name,
> +			    u64 start_offset, u64 range,
> +			    u64 reserve_offset, u64 reserve_range);
> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
> +
> +/**
> + * struct drm_gpuva_region - structure to track a portion of GPU VA space
> + *
> + * This structure represents a portion of a GPUs VA space and is associated
> + * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
> + *
> + * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
> + * placed within a &drm_gpuva_region.
> + */
> +struct drm_gpuva_region {
> +	/**
> +	 * @node: the &drm_mm_node to track the GPU VA region
> +	 */
> +	struct drm_mm_node node;
> +
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @sparse: indicates whether this region is sparse
> +	 */
> +	bool sparse;
> +};
> +
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range);
> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			    struct drm_gpuva_region *reg,
> +			    u64 addr, u64 range);
> +void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
> +			      struct drm_gpuva_region *reg);
> +
> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		     struct drm_gpuva *va,
> +		     u64 addr, u64 range);
> +/**
> + * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
> + * @node__: &drm_gpuva_region structure to assign to in each iteration step
> + * @gpuva__: &drm_gpuva_manager structure to walk
> + * @start__: starting offset, the first node will overlap this
> + * @end__: ending offset, the last node will start before this (but may overlap)
> + *
> + * This iterator walks over all nodes in the range allocator that lie
> + * between @start and @end. It is implemented similarly to list_for_each(),
> + * but is using &drm_mm's internal interval tree to accelerate the search for
> + * the starting node, and hence isn't safe against removal of elements. It
> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
> + * backing &drm_mm, and may even continue indefinitely.
> + */
> +#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
> +	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
> +									 (start__), (end__)-1); \
> +	     node__->node.start < (end__); \
> +	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
> +
> +/**
> + * drm_gpuva_for_each_region - iternator to walk over a range of nodes
> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva_region structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gpuva_for_each_region(entry, gpuva) \
> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
> +
> +/**
> + * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
> + * nodes
> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
> + * @next: &next &drm_gpuva_region to store the next step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva_region structures associated with the
> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
> + * against removal of elements.
> + */
> +#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
> +
> +
> +/**
> + * enum drm_gpuva_flags - flags for struct drm_gpuva
> + */
> +enum drm_gpuva_flags {
> +	/**
> +	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
> +	 */
> +	DRM_GPUVA_SWAPPED = (1 << 0),
> +};
> +
> +/**
> + * struct drm_gpuva - structure to track a GPU VA mapping
> + *
> + * This structure represents a GPU VA mapping and is associated with a
> + * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
> + *
> + * Typically, this structure is embedded in bigger driver structures.
> + */
> +struct drm_gpuva {
> +	/**
> +	 * @node: the &drm_mm_node to track the GPU VA mapping
> +	 */
> +	struct drm_mm_node node;
> +
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
> +	 */
> +	struct drm_gpuva_region *region;
> +
> +	/**
> +	 * @head: the &list_head to attach this object to a &drm_gem_object
> +	 */
> +	struct list_head head;
> +
> +	/**
> +	 * @flags: the &drm_gpuva_flags for this mapping
> +	 */
> +	enum drm_gpuva_flags flags;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the mapped &drm_gem_object
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +void drm_gpuva_link_locked(struct drm_gpuva *va);
> +void drm_gpuva_link_unlocked(struct drm_gpuva *va);
> +void drm_gpuva_unlink_locked(struct drm_gpuva *va);
> +void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
> +
> +void drm_gpuva_destroy_locked(struct drm_gpuva *va);
> +void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
> +
> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +				 u64 addr, u64 range);
> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
> +
> +/**
> + * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
> + * @va: the &drm_gpuva to set the swap flag of
> + * @swap: indicates whether the &drm_gpuva is swapped
> + */
> +static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
> +{
> +	if (swap)
> +		va->flags |= DRM_GPUVA_SWAPPED;
> +	else
> +		va->flags &= ~DRM_GPUVA_SWAPPED;
> +}
> +
> +/**
> + * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
> + * is swapped
> + * @va: the &drm_gpuva to check
> + */
> +static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
> +{
> +	return va->flags & DRM_GPUVA_SWAPPED;
> +}
> +
> +/**
> + * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
> + * @node__: &drm_gpuva structure to assign to in each iteration step
> + * @gpuva__: &drm_gpuva_manager structure to walk
> + * @start__: starting offset, the first node will overlap this
> + * @end__: ending offset, the last node will start before this (but may overlap)
> + *
> + * This iterator walks over all nodes in the range allocator that lie
> + * between @start and @end. It is implemented similarly to list_for_each(),
> + * but is using &drm_mm's internal interval tree to accelerate the search for
> + * the starting node, and hence isn't safe against removal of elements. It
> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
> + * backing &drm_mm, and may even continue indefinitely.
> + */
> +#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
> +	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
> +								  (start__), (end__)-1); \
> +	     node__->node.start < (end__); \
> +	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
> +
> +/**
> + * drm_gpuva_for_each_va - iternator to walk over a range of nodes
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gpuva_for_each_va(entry, gpuva) \
> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
> +
> +/**
> + * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
> + * nodes
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @next: &next &drm_gpuva to store the next step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
> + * against removal of elements.
> + */
> +#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
> +
> +/**
> + * enum drm_gpuva_op_type - GPU VA operation type
> + *
> + * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
> + * can be map, remap or unmap operations.
> + */
> +enum drm_gpuva_op_type {
> +	/**
> +	 * @DRM_GPUVA_OP_MAP: the map op type
> +	 */
> +	DRM_GPUVA_OP_MAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_REMAP: the remap op type
> +	 */
> +	DRM_GPUVA_OP_REMAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
> +	 */
> +	DRM_GPUVA_OP_UNMAP,
> +};
> +
> +/**
> + * struct drm_gpuva_op_map - GPU VA map operation
> + *
> + * This structure represents a single map operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_map {
> +	/**
> +	 * @va: structure containing address and range of a map
> +	 * operation
> +	 */
> +	struct {
> +		/**
> +		 * @addr: the base address of the new mapping
> +		 */
> +		u64 addr;
> +
> +		/**
> +		 * @range: the range of the new mapping
> +		 */
> +		u64 range;
> +	} va;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the &drm_gem_object to map
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +/**
> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
> + *
> + * This structure represents a single unmap operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_unmap {
> +	/**
> +	 * @va: the &drm_gpuva to unmap
> +	 */
> +	struct drm_gpuva *va;
> +
> +	/**
> +	 * @keep:
> +	 *
> +	 * Indicates whether this &drm_gpuva is physically contiguous with the
> +	 * original mapping request.
> +	 *
> +	 * Optionally, if &keep is set, drivers may keep the actual page table
> +	 * mappings for this &drm_gpuva, adding the missing page table entries
> +	 * only and update the &drm_gpuva_manager accordingly.
> +	 */
> +	bool keep;
> +};
> +
> +/**
> + * struct drm_gpuva_op_remap - GPU VA remap operation
> + *
> + * This represents a single remap operation generated by the DRM GPU VA manager.
> + *
> + * A remap operation is generated when an existing GPU VA mmapping is split up
> + * by inserting a new GPU VA mapping or by partially unmapping existent
> + * mapping(s), hence it consists of a maximum of two map and one unmap
> + * operation.
> + *
> + * The @unmap operation takes care of removing the original existing mapping.
> + * @prev is used to remap the preceding part, @next the subsequent part.
> + *
> + * If either a new mapping's start address is aligned with the start address
> + * of the old mapping or the new mapping's end address is aligned with the
> + * end address of the old mapping, either @prev or @next is NULL.
> + *
> + * Note, the reason for a dedicated remap operation, rather than arbitrary
> + * unmap and map operations, is to give drivers the chance of extracting driver
> + * specific data for creating the new mappings from the unmap operations's
> + * &drm_gpuva structure which typically is embedded in larger driver specific
> + * structures.
> + */
> +struct drm_gpuva_op_remap {
> +	/**
> +	 * @prev: the preceding part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *prev;
> +
> +	/**
> +	 * @next: the subsequent part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *next;
> +
> +	/**
> +	 * @unmap: the unmap operation for the original existing mapping
> +	 */
> +	struct drm_gpuva_op_unmap *unmap;
> +};
> +
> +/**
> + * struct drm_gpuva_op - GPU VA operation
> + *
> + * This structure represents a single generic operation, which can be either
> + * map, unmap or remap.
> + *
> + * The particular type of the operation is defined by @op.
> + */
> +struct drm_gpuva_op {
> +	/**
> +	 * @entry:
> +	 *
> +	 * The &list_head used to distribute instances of this struct within
> +	 * &drm_gpuva_ops.
> +	 */
> +	struct list_head entry;
> +
> +	/**
> +	 * @op: the type of the operation
> +	 */
> +	enum drm_gpuva_op_type op;
> +
> +	union {
> +		/**
> +		 * @map: the map operation
> +		 */
> +		struct drm_gpuva_op_map map;
> +
> +		/**
> +		 * @unmap: the unmap operation
> +		 */
> +		struct drm_gpuva_op_unmap unmap;
> +
> +		/**
> +		 * @remap: the remap operation
> +		 */
> +		struct drm_gpuva_op_remap remap;
> +	};
> +};
> +
> +/**
> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
> + */
> +struct drm_gpuva_ops {
> +	/**
> +	 * @list: the &list_head
> +	 */
> +	struct list_head list;
> +};
> +
> +/**
> + * drm_gpuva_for_each_op - iterator to walk over all ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations.
> + */
> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
> +
> +/**
> + * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @next: &next &drm_gpuva_op to store the next step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations. It is
> + * implemented with list_for_each_safe(), so save against removal of elements.
> + */
> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
> +	list_for_each_entry_safe(op, next, &(ops)->list, entry)
> +
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range,
> +			    struct drm_gem_object *obj, u64 offset);
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 addr, u64 range);
> +void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
> +
> +#endif /* __DRM_GPUVA_MGR_H__ */
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
                     ` (2 preceding siblings ...)
  2023-01-26 23:43   ` Matthew Brost
@ 2023-01-27  0:24   ` Matthew Brost
  2023-02-03 17:37   ` Matthew Brost
  4 siblings, 0 replies; 75+ messages in thread
From: Matthew Brost @ 2023-01-27  0:24 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
> This adds the infrastructure for a manager implementation to keep track
> of GPU virtual address (VA) mappings.
> 
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
> 
> 1) Provide a dedicated range allocator to track GPU VA allocations and
>    mappings, making use of the drm_mm range allocator.
> 
> 2) Generically connect GPU VA mappings to their backing buffers, in
>    particular DRM GEM objects.
> 
> 3) Provide a common implementation to perform more complex mapping
>    operations on the GPU VA space. In particular splitting and merging
>    of GPU VA mappings, e.g. for intersecting mapping requests or partial
>    unmap requests.
> 
> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>

<snip>

> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c

<snip>

> +struct drm_gpuva *
> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +	       u64 addr, u64 range)
> +{
> +	struct drm_gpuva *va;
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {

Last argument should be: range + addr, right?

> +		if (va->node.start == addr &&
> +		    va->node.size == range)
> +			return va;
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find);
> +
> +/**
> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @start: the given GPU VA's start address
> + *
> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> +{
> +	struct drm_mm_node *node;
> +
> +	if (start <= mgr->mm_start ||
> +	    start > (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
> +	if (node == &mgr->va_mm.head_node)
> +		return NULL;
> +
> +	return (struct drm_gpuva *)node;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_prev);
> +
> +/**
> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @end: the given GPU VA's end address
> + *
> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
> +{
> +	struct drm_mm_node *node;
> +
> +	if (end < mgr->mm_start ||
> +	    end >= (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
> +	if (node == &mgr->va_mm.head_node)
> +		return NULL;
> +
> +	return (struct drm_gpuva *)node;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_next);
> +
> +/**
> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @reg: the &drm_gpuva_region to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva_region with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			struct drm_gpuva_region *reg,
> +			u64 addr, u64 range)
> +{
> +	int ret;
> +
> +	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
> +					  range, 0,
> +					  0, addr,
> +					  addr + range,
> +					  DRM_MM_INSERT_LOW|
> +					  DRM_MM_INSERT_ONCE);
> +	if (ret)
> +		return ret;
> +
> +	reg->mgr = mgr;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_insert);
> +
> +/**
> + * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager holding the region
> + * @reg: the &drm_gpuva to destroy
> + *
> + * This removes the given &reg from the underlaying range allocator.
> + */
> +void
> +drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
> +			 struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva *va;
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr,
> +				       reg->node.start,
> +				       reg->node.size) {

Last argument should be: reg->node.start + reg->node.size, right?

Matt

> +		WARN(1, "GPU VA region must be empty on destroy.\n");
> +		return;
> +	}
> +
> +	if (&reg->node == &mgr->kernel_alloc_node) {
> +		WARN(1, "Can't destroy kernel reserved region.\n");
> +		return;
> +	}
> +
> +	drm_mm_remove_node(&reg->node);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_destroy);

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-18  6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
@ 2023-01-27  1:05   ` Matthew Brost
  2023-01-27  1:26     ` Danilo Krummrich
  2023-01-27  1:43     ` Danilo Krummrich
  0 siblings, 2 replies; 75+ messages in thread
From: Matthew Brost @ 2023-01-27  1:05 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
> This commit provides the interfaces for the new UAPI motivated by the
> Vulkan API. It allows user mode drivers (UMDs) to:
> 
> 1) Initialize a GPU virtual address (VA) space via the new
>    DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>    VA area.
> 
> 2) Bind and unbind GPU VA space mappings via the new
>    DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> 
> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> 
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
> asynchronous processing with DRM syncobjs as synchronization mechanism.
> 
> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> 
> Co-authored-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  Documentation/gpu/driver-uapi.rst |   8 ++
>  include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
>  2 files changed, 224 insertions(+)
> 
> diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
> index 4411e6919a3d..9c7ca6e33a68 100644
> --- a/Documentation/gpu/driver-uapi.rst
> +++ b/Documentation/gpu/driver-uapi.rst
> @@ -6,3 +6,11 @@ drm/i915 uAPI
>  =============
>  
>  .. kernel-doc:: include/uapi/drm/i915_drm.h
> +
> +drm/nouveau uAPI
> +================
> +
> +VM_BIND / EXEC uAPI
> +-------------------
> +
> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> index 853a327433d3..f6e7d40201d4 100644
> --- a/include/uapi/drm/nouveau_drm.h
> +++ b/include/uapi/drm/nouveau_drm.h
> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>  	__u32 handle;
>  };
>  
> +/**
> + * struct drm_nouveau_sync - sync object
> + *
> + * This structure serves as synchronization mechanism for (potentially)
> + * asynchronous operations such as EXEC or VM_BIND.
> + */
> +struct drm_nouveau_sync {
> +	/**
> +	 * @flags: the flags for a sync object
> +	 *
> +	 * The first 8 bits are used to determine the type of the sync object.
> +	 */
> +	__u32 flags;
> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
> +	/**
> +	 * @handle: the handle of the sync object
> +	 */
> +	__u32 handle;
> +	/**
> +	 * @timeline_value:
> +	 *
> +	 * The timeline point of the sync object in case the syncobj is of
> +	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
> +	 */
> +	__u64 timeline_value;
> +};
> +
> +/**
> + * struct drm_nouveau_vm_init - GPU VA space init structure
> + *
> + * Used to initialize the GPU's VA space for a user client, telling the kernel
> + * which portion of the VA space is managed by the UMD and kernel respectively.
> + */
> +struct drm_nouveau_vm_init {
> +	/**
> +	 * @unmanaged_addr: start address of the kernel managed VA space region
> +	 */
> +	__u64 unmanaged_addr;
> +	/**
> +	 * @unmanaged_size: size of the kernel managed VA space region in bytes
> +	 */
> +	__u64 unmanaged_size;
> +};
> +
> +/**
> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
> + *
> + * This structure represents a single VM_BIND operation. UMDs should pass
> + * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
> + */
> +struct drm_nouveau_vm_bind_op {
> +	/**
> +	 * @op: the operation type
> +	 */
> +	__u32 op;
> +/**
> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
> + *
> + * The alloc operation is used to reserve a VA space region within the GPU's VA
> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
> + * instruct the kernel to create sparse mappings for the given region.
> + */
> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0

Do you really need this operation? We have no concept of this in Xe,
e.g. we can create a VM and the entire address space is managed exactly
the same.

If this can be removed then the entire concept of regions in the GPUVA
can be removed too (drop struct drm_gpuva_region). I say this because
in Xe as I'm porting over to GPUVA the first thing I'm doing after
drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
address space. To me this seems kinda useless but maybe I'm missing why
you need this for Nouveau. 

Matt

> +/**
> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
> + */
> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
> +/**
> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
> + *
> + * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
> + * a previously allocated VA space region. If the region is sparse, existing
> + * sparse mappings are overwritten.
> + */
> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
> +/**
> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
> + *
> + * Unmap an existing mapping in the GPU's VA space. If the region the mapping
> + * is located in is a sparse region, new sparse mappings are created where the
> + * unmapped (memory backed) mapping was mapped previously.
> + */
> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
> +	/**
> +	 * @flags: the flags for a &drm_nouveau_vm_bind_op
> +	 */
> +	__u32 flags;
> +/**
> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
> + *
> + * Indicates that an allocated VA space region should be sparse.
> + */
> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
> +	/**
> +	 * @handle: the handle of the DRM GEM object to map
> +	 */
> +	__u32 handle;
> +	/**
> +	 * @addr:
> +	 *
> +	 * the address the VA space region or (memory backed) mapping should be mapped to
> +	 */
> +	__u64 addr;
> +	/**
> +	 * @bo_offset: the offset within the BO backing the mapping
> +	 */
> +	__u64 bo_offset;
> +	/**
> +	 * @range: the size of the requested mapping in bytes
> +	 */
> +	__u64 range;
> +};
> +
> +/**
> + * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
> + */
> +struct drm_nouveau_vm_bind {
> +	/**
> +	 * @op_count: the number of &drm_nouveau_vm_bind_op
> +	 */
> +	__u32 op_count;
> +	/**
> +	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
> +	 */
> +	__u32 flags;
> +/**
> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
> + *
> + * Indicates that the given VM_BIND operation should be executed asynchronously
> + * by the kernel.
> + *
> + * If this flag is not supplied the kernel executes the associated operations
> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
> + */
> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
> +	/**
> +	 * @wait_count: the number of wait &drm_nouveau_syncs
> +	 */
> +	__u32 wait_count;
> +	/**
> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
> +	 */
> +	__u32 sig_count;
> +	/**
> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
> +	 */
> +	__u64 wait_ptr;
> +	/**
> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
> +	 */
> +	__u64 sig_ptr;
> +	/**
> +	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
> +	 */
> +	__u64 op_ptr;
> +};
> +
> +/**
> + * struct drm_nouveau_exec_push - EXEC push operation
> + *
> + * This structure represents a single EXEC push operation. UMDs should pass an
> + * array of this structure via struct drm_nouveau_exec's &push_ptr field.
> + */
> +struct drm_nouveau_exec_push {
> +	/**
> +	 * @va: the virtual address of the push buffer mapping
> +	 */
> +	__u64 va;
> +	/**
> +	 * @va_len: the length of the push buffer mapping
> +	 */
> +	__u64 va_len;
> +};
> +
> +/**
> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
> + */
> +struct drm_nouveau_exec {
> +	/**
> +	 * @channel: the channel to execute the push buffer in
> +	 */
> +	__u32 channel;
> +	/**
> +	 * @push_count: the number of &drm_nouveau_exec_push ops
> +	 */
> +	__u32 push_count;
> +	/**
> +	 * @wait_count: the number of wait &drm_nouveau_syncs
> +	 */
> +	__u32 wait_count;
> +	/**
> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
> +	 */
> +	__u32 sig_count;
> +	/**
> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
> +	 */
> +	__u64 wait_ptr;
> +	/**
> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
> +	 */
> +	__u64 sig_ptr;
> +	/**
> +	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
> +	 */
> +	__u64 push_ptr;
> +};
> +
>  #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>  #define DRM_NOUVEAU_NVIF               0x07
>  #define DRM_NOUVEAU_SVM_INIT           0x08
>  #define DRM_NOUVEAU_SVM_BIND           0x09
> +#define DRM_NOUVEAU_VM_INIT            0x10
> +#define DRM_NOUVEAU_VM_BIND            0x11
> +#define DRM_NOUVEAU_EXEC               0x12
>  #define DRM_NOUVEAU_GEM_NEW            0x40
>  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>  #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
>  #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>  
> +#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
> +#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
> +#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  1:05   ` Matthew Brost
@ 2023-01-27  1:26     ` Danilo Krummrich
  2023-01-27  7:55       ` Christian König
  2023-01-27  1:43     ` Danilo Krummrich
  1 sibling, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27  1:26 UTC (permalink / raw)
  To: Matthew Brost
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On 1/27/23 02:05, Matthew Brost wrote:
> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>> This commit provides the interfaces for the new UAPI motivated by the
>> Vulkan API. It allows user mode drivers (UMDs) to:
>>
>> 1) Initialize a GPU virtual address (VA) space via the new
>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>>     VA area.
>>
>> 2) Bind and unbind GPU VA space mappings via the new
>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>
>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>> asynchronous processing with DRM syncobjs as synchronization mechanism.
>>
>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>
>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>   include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
>>   2 files changed, 224 insertions(+)
>>
>> diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
>> index 4411e6919a3d..9c7ca6e33a68 100644
>> --- a/Documentation/gpu/driver-uapi.rst
>> +++ b/Documentation/gpu/driver-uapi.rst
>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>   =============
>>   
>>   .. kernel-doc:: include/uapi/drm/i915_drm.h
>> +
>> +drm/nouveau uAPI
>> +================
>> +
>> +VM_BIND / EXEC uAPI
>> +-------------------
>> +
>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
>> index 853a327433d3..f6e7d40201d4 100644
>> --- a/include/uapi/drm/nouveau_drm.h
>> +++ b/include/uapi/drm/nouveau_drm.h
>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>   	__u32 handle;
>>   };
>>   
>> +/**
>> + * struct drm_nouveau_sync - sync object
>> + *
>> + * This structure serves as synchronization mechanism for (potentially)
>> + * asynchronous operations such as EXEC or VM_BIND.
>> + */
>> +struct drm_nouveau_sync {
>> +	/**
>> +	 * @flags: the flags for a sync object
>> +	 *
>> +	 * The first 8 bits are used to determine the type of the sync object.
>> +	 */
>> +	__u32 flags;
>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>> +	/**
>> +	 * @handle: the handle of the sync object
>> +	 */
>> +	__u32 handle;
>> +	/**
>> +	 * @timeline_value:
>> +	 *
>> +	 * The timeline point of the sync object in case the syncobj is of
>> +	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>> +	 */
>> +	__u64 timeline_value;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>> + *
>> + * Used to initialize the GPU's VA space for a user client, telling the kernel
>> + * which portion of the VA space is managed by the UMD and kernel respectively.
>> + */
>> +struct drm_nouveau_vm_init {
>> +	/**
>> +	 * @unmanaged_addr: start address of the kernel managed VA space region
>> +	 */
>> +	__u64 unmanaged_addr;
>> +	/**
>> +	 * @unmanaged_size: size of the kernel managed VA space region in bytes
>> +	 */
>> +	__u64 unmanaged_size;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>> + *
>> + * This structure represents a single VM_BIND operation. UMDs should pass
>> + * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
>> + */
>> +struct drm_nouveau_vm_bind_op {
>> +	/**
>> +	 * @op: the operation type
>> +	 */
>> +	__u32 op;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>> + *
>> + * The alloc operation is used to reserve a VA space region within the GPU's VA
>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
>> + * instruct the kernel to create sparse mappings for the given region.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
> 
> Do you really need this operation? We have no concept of this in Xe,
> e.g. we can create a VM and the entire address space is managed exactly
> the same.

The idea for alloc/free is to let UMDs allocate a portion of the VA 
space (which I call a region), basically the same thing Vulkan 
represents with a VKBuffer.

It serves two purposes:

1. It gives the kernel (in particular the GPUVA manager) the bounds in 
which it is allowed to merge mappings. E.g. when a user request asks for 
a new mapping and we detect we could merge this mapping with an existing 
one (used in another VKBuffer than the mapping request came for) the 
driver is not allowed to change the page table for the existing mapping 
we want to merge with (assuming that some drivers would need to do this 
in order to merge), because the existing mapping could already be in use 
and by re-mapping it we'd potentially cause a fault on the GPU.

2. It is used for sparse residency in a way that such an allocated VA 
space region can be flagged as sparse, such that the kernel always keeps 
sparse mappings around for the parts of the region that do not contain 
actual memory backed mappings.

If for your driver merging is always OK, creating a single huge region 
would do the trick I guess. Otherwise, we could also add an option to 
the GPUVA manager (or a specific region, which could also be a single 
huge one) within which it never merges.

> 
> If this can be removed then the entire concept of regions in the GPUVA
> can be removed too (drop struct drm_gpuva_region). I say this because
> in Xe as I'm porting over to GPUVA the first thing I'm doing after
> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
> address space. To me this seems kinda useless but maybe I'm missing why
> you need this for Nouveau.
> 
> Matt
> 
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>> + *
>> + * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
>> + * a previously allocated VA space region. If the region is sparse, existing
>> + * sparse mappings are overwritten.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>> + *
>> + * Unmap an existing mapping in the GPU's VA space. If the region the mapping
>> + * is located in is a sparse region, new sparse mappings are created where the
>> + * unmapped (memory backed) mapping was mapped previously.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>> +	/**
>> +	 * @flags: the flags for a &drm_nouveau_vm_bind_op
>> +	 */
>> +	__u32 flags;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>> + *
>> + * Indicates that an allocated VA space region should be sparse.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>> +	/**
>> +	 * @handle: the handle of the DRM GEM object to map
>> +	 */
>> +	__u32 handle;
>> +	/**
>> +	 * @addr:
>> +	 *
>> +	 * the address the VA space region or (memory backed) mapping should be mapped to
>> +	 */
>> +	__u64 addr;
>> +	/**
>> +	 * @bo_offset: the offset within the BO backing the mapping
>> +	 */
>> +	__u64 bo_offset;
>> +	/**
>> +	 * @range: the size of the requested mapping in bytes
>> +	 */
>> +	__u64 range;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
>> + */
>> +struct drm_nouveau_vm_bind {
>> +	/**
>> +	 * @op_count: the number of &drm_nouveau_vm_bind_op
>> +	 */
>> +	__u32 op_count;
>> +	/**
>> +	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>> +	 */
>> +	__u32 flags;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>> + *
>> + * Indicates that the given VM_BIND operation should be executed asynchronously
>> + * by the kernel.
>> + *
>> + * If this flag is not supplied the kernel executes the associated operations
>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>> +	/**
>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>> +	 */
>> +	__u32 wait_count;
>> +	/**
>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u32 sig_count;
>> +	/**
>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>> +	 */
>> +	__u64 wait_ptr;
>> +	/**
>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u64 sig_ptr;
>> +	/**
>> +	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>> +	 */
>> +	__u64 op_ptr;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_exec_push - EXEC push operation
>> + *
>> + * This structure represents a single EXEC push operation. UMDs should pass an
>> + * array of this structure via struct drm_nouveau_exec's &push_ptr field.
>> + */
>> +struct drm_nouveau_exec_push {
>> +	/**
>> +	 * @va: the virtual address of the push buffer mapping
>> +	 */
>> +	__u64 va;
>> +	/**
>> +	 * @va_len: the length of the push buffer mapping
>> +	 */
>> +	__u64 va_len;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>> + */
>> +struct drm_nouveau_exec {
>> +	/**
>> +	 * @channel: the channel to execute the push buffer in
>> +	 */
>> +	__u32 channel;
>> +	/**
>> +	 * @push_count: the number of &drm_nouveau_exec_push ops
>> +	 */
>> +	__u32 push_count;
>> +	/**
>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>> +	 */
>> +	__u32 wait_count;
>> +	/**
>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u32 sig_count;
>> +	/**
>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>> +	 */
>> +	__u64 wait_ptr;
>> +	/**
>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u64 sig_ptr;
>> +	/**
>> +	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
>> +	 */
>> +	__u64 push_ptr;
>> +};
>> +
>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>   #define DRM_NOUVEAU_NVIF               0x07
>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>> +#define DRM_NOUVEAU_VM_INIT            0x10
>> +#define DRM_NOUVEAU_VM_BIND            0x11
>> +#define DRM_NOUVEAU_EXEC               0x12
>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>   
>> +#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>> +#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>> +#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>   #if defined(__cplusplus)
>>   }
>>   #endif
>> -- 
>> 2.39.0
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  1:05   ` Matthew Brost
  2023-01-27  1:26     ` Danilo Krummrich
@ 2023-01-27  1:43     ` Danilo Krummrich
  2023-01-27  3:21       ` Matthew Brost
  1 sibling, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27  1:43 UTC (permalink / raw)
  To: Matthew Brost
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc



On 1/27/23 02:05, Matthew Brost wrote:
> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>> This commit provides the interfaces for the new UAPI motivated by the
>> Vulkan API. It allows user mode drivers (UMDs) to:
>>
>> 1) Initialize a GPU virtual address (VA) space via the new
>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>>     VA area.
>>
>> 2) Bind and unbind GPU VA space mappings via the new
>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>
>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>> asynchronous processing with DRM syncobjs as synchronization mechanism.
>>
>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>
>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>   include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
>>   2 files changed, 224 insertions(+)
>>
>> diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
>> index 4411e6919a3d..9c7ca6e33a68 100644
>> --- a/Documentation/gpu/driver-uapi.rst
>> +++ b/Documentation/gpu/driver-uapi.rst
>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>   =============
>>   
>>   .. kernel-doc:: include/uapi/drm/i915_drm.h
>> +
>> +drm/nouveau uAPI
>> +================
>> +
>> +VM_BIND / EXEC uAPI
>> +-------------------
>> +
>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
>> index 853a327433d3..f6e7d40201d4 100644
>> --- a/include/uapi/drm/nouveau_drm.h
>> +++ b/include/uapi/drm/nouveau_drm.h
>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>   	__u32 handle;
>>   };
>>   
>> +/**
>> + * struct drm_nouveau_sync - sync object
>> + *
>> + * This structure serves as synchronization mechanism for (potentially)
>> + * asynchronous operations such as EXEC or VM_BIND.
>> + */
>> +struct drm_nouveau_sync {
>> +	/**
>> +	 * @flags: the flags for a sync object
>> +	 *
>> +	 * The first 8 bits are used to determine the type of the sync object.
>> +	 */
>> +	__u32 flags;
>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>> +	/**
>> +	 * @handle: the handle of the sync object
>> +	 */
>> +	__u32 handle;
>> +	/**
>> +	 * @timeline_value:
>> +	 *
>> +	 * The timeline point of the sync object in case the syncobj is of
>> +	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>> +	 */
>> +	__u64 timeline_value;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>> + *
>> + * Used to initialize the GPU's VA space for a user client, telling the kernel
>> + * which portion of the VA space is managed by the UMD and kernel respectively.
>> + */
>> +struct drm_nouveau_vm_init {
>> +	/**
>> +	 * @unmanaged_addr: start address of the kernel managed VA space region
>> +	 */
>> +	__u64 unmanaged_addr;
>> +	/**
>> +	 * @unmanaged_size: size of the kernel managed VA space region in bytes
>> +	 */
>> +	__u64 unmanaged_size;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>> + *
>> + * This structure represents a single VM_BIND operation. UMDs should pass
>> + * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
>> + */
>> +struct drm_nouveau_vm_bind_op {
>> +	/**
>> +	 * @op: the operation type
>> +	 */
>> +	__u32 op;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>> + *
>> + * The alloc operation is used to reserve a VA space region within the GPU's VA
>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
>> + * instruct the kernel to create sparse mappings for the given region.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
> 
> Do you really need this operation? We have no concept of this in Xe,
> e.g. we can create a VM and the entire address space is managed exactly
> the same.
> 
> If this can be removed then the entire concept of regions in the GPUVA
> can be removed too (drop struct drm_gpuva_region). I say this because
> in Xe as I'm porting over to GPUVA the first thing I'm doing after
> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
> address space. 

Also, since you've been starting to use the code, this [1] is the branch 
I'm pushing my fixes for a v2 to. It already contains the changes for 
the GPUVA manager except for switching away from drm_mm.

[1] 
https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-fixes

> To me this seems kinda useless but maybe I'm missing why
> you need this for Nouveau.
> 
> Matt
> 
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>> + *
>> + * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
>> + * a previously allocated VA space region. If the region is sparse, existing
>> + * sparse mappings are overwritten.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>> + *
>> + * Unmap an existing mapping in the GPU's VA space. If the region the mapping
>> + * is located in is a sparse region, new sparse mappings are created where the
>> + * unmapped (memory backed) mapping was mapped previously.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>> +	/**
>> +	 * @flags: the flags for a &drm_nouveau_vm_bind_op
>> +	 */
>> +	__u32 flags;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>> + *
>> + * Indicates that an allocated VA space region should be sparse.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>> +	/**
>> +	 * @handle: the handle of the DRM GEM object to map
>> +	 */
>> +	__u32 handle;
>> +	/**
>> +	 * @addr:
>> +	 *
>> +	 * the address the VA space region or (memory backed) mapping should be mapped to
>> +	 */
>> +	__u64 addr;
>> +	/**
>> +	 * @bo_offset: the offset within the BO backing the mapping
>> +	 */
>> +	__u64 bo_offset;
>> +	/**
>> +	 * @range: the size of the requested mapping in bytes
>> +	 */
>> +	__u64 range;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
>> + */
>> +struct drm_nouveau_vm_bind {
>> +	/**
>> +	 * @op_count: the number of &drm_nouveau_vm_bind_op
>> +	 */
>> +	__u32 op_count;
>> +	/**
>> +	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>> +	 */
>> +	__u32 flags;
>> +/**
>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>> + *
>> + * Indicates that the given VM_BIND operation should be executed asynchronously
>> + * by the kernel.
>> + *
>> + * If this flag is not supplied the kernel executes the associated operations
>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>> + */
>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>> +	/**
>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>> +	 */
>> +	__u32 wait_count;
>> +	/**
>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u32 sig_count;
>> +	/**
>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>> +	 */
>> +	__u64 wait_ptr;
>> +	/**
>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u64 sig_ptr;
>> +	/**
>> +	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>> +	 */
>> +	__u64 op_ptr;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_exec_push - EXEC push operation
>> + *
>> + * This structure represents a single EXEC push operation. UMDs should pass an
>> + * array of this structure via struct drm_nouveau_exec's &push_ptr field.
>> + */
>> +struct drm_nouveau_exec_push {
>> +	/**
>> +	 * @va: the virtual address of the push buffer mapping
>> +	 */
>> +	__u64 va;
>> +	/**
>> +	 * @va_len: the length of the push buffer mapping
>> +	 */
>> +	__u64 va_len;
>> +};
>> +
>> +/**
>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>> + */
>> +struct drm_nouveau_exec {
>> +	/**
>> +	 * @channel: the channel to execute the push buffer in
>> +	 */
>> +	__u32 channel;
>> +	/**
>> +	 * @push_count: the number of &drm_nouveau_exec_push ops
>> +	 */
>> +	__u32 push_count;
>> +	/**
>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>> +	 */
>> +	__u32 wait_count;
>> +	/**
>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u32 sig_count;
>> +	/**
>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>> +	 */
>> +	__u64 wait_ptr;
>> +	/**
>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>> +	 */
>> +	__u64 sig_ptr;
>> +	/**
>> +	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
>> +	 */
>> +	__u64 push_ptr;
>> +};
>> +
>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>   #define DRM_NOUVEAU_NVIF               0x07
>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>> +#define DRM_NOUVEAU_VM_INIT            0x10
>> +#define DRM_NOUVEAU_VM_BIND            0x11
>> +#define DRM_NOUVEAU_EXEC               0x12
>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>   
>> +#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>> +#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>> +#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>   #if defined(__cplusplus)
>>   }
>>   #endif
>> -- 
>> 2.39.0
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  1:43     ` Danilo Krummrich
@ 2023-01-27  3:21       ` Matthew Brost
  2023-01-27  3:33         ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Matthew Brost @ 2023-01-27  3:21 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Fri, Jan 27, 2023 at 02:43:30AM +0100, Danilo Krummrich wrote:
> 
> 
> On 1/27/23 02:05, Matthew Brost wrote:
> > On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
> > > This commit provides the interfaces for the new UAPI motivated by the
> > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > 
> > > 1) Initialize a GPU virtual address (VA) space via the new
> > >     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
> > >     VA area.
> > > 
> > > 2) Bind and unbind GPU VA space mappings via the new
> > >     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > > 
> > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> > > 
> > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
> > > asynchronous processing with DRM syncobjs as synchronization mechanism.
> > > 
> > > The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
> > > DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> > > 
> > > Co-authored-by: Dave Airlie <airlied@redhat.com>
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >   Documentation/gpu/driver-uapi.rst |   8 ++
> > >   include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
> > >   2 files changed, 224 insertions(+)
> > > 
> > > diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
> > > index 4411e6919a3d..9c7ca6e33a68 100644
> > > --- a/Documentation/gpu/driver-uapi.rst
> > > +++ b/Documentation/gpu/driver-uapi.rst
> > > @@ -6,3 +6,11 @@ drm/i915 uAPI
> > >   =============
> > >   .. kernel-doc:: include/uapi/drm/i915_drm.h
> > > +
> > > +drm/nouveau uAPI
> > > +================
> > > +
> > > +VM_BIND / EXEC uAPI
> > > +-------------------
> > > +
> > > +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
> > > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> > > index 853a327433d3..f6e7d40201d4 100644
> > > --- a/include/uapi/drm/nouveau_drm.h
> > > +++ b/include/uapi/drm/nouveau_drm.h
> > > @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
> > >   	__u32 handle;
> > >   };
> > > +/**
> > > + * struct drm_nouveau_sync - sync object
> > > + *
> > > + * This structure serves as synchronization mechanism for (potentially)
> > > + * asynchronous operations such as EXEC or VM_BIND.
> > > + */
> > > +struct drm_nouveau_sync {
> > > +	/**
> > > +	 * @flags: the flags for a sync object
> > > +	 *
> > > +	 * The first 8 bits are used to determine the type of the sync object.
> > > +	 */
> > > +	__u32 flags;
> > > +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
> > > +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
> > > +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
> > > +	/**
> > > +	 * @handle: the handle of the sync object
> > > +	 */
> > > +	__u32 handle;
> > > +	/**
> > > +	 * @timeline_value:
> > > +	 *
> > > +	 * The timeline point of the sync object in case the syncobj is of
> > > +	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
> > > +	 */
> > > +	__u64 timeline_value;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_nouveau_vm_init - GPU VA space init structure
> > > + *
> > > + * Used to initialize the GPU's VA space for a user client, telling the kernel
> > > + * which portion of the VA space is managed by the UMD and kernel respectively.
> > > + */
> > > +struct drm_nouveau_vm_init {
> > > +	/**
> > > +	 * @unmanaged_addr: start address of the kernel managed VA space region
> > > +	 */
> > > +	__u64 unmanaged_addr;
> > > +	/**
> > > +	 * @unmanaged_size: size of the kernel managed VA space region in bytes
> > > +	 */
> > > +	__u64 unmanaged_size;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_nouveau_vm_bind_op - VM_BIND operation
> > > + *
> > > + * This structure represents a single VM_BIND operation. UMDs should pass
> > > + * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
> > > + */
> > > +struct drm_nouveau_vm_bind_op {
> > > +	/**
> > > +	 * @op: the operation type
> > > +	 */
> > > +	__u32 op;
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
> > > + *
> > > + * The alloc operation is used to reserve a VA space region within the GPU's VA
> > > + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
> > > + * instruct the kernel to create sparse mappings for the given region.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
> > 
> > Do you really need this operation? We have no concept of this in Xe,
> > e.g. we can create a VM and the entire address space is managed exactly
> > the same.
> > 
> > If this can be removed then the entire concept of regions in the GPUVA
> > can be removed too (drop struct drm_gpuva_region). I say this because
> > in Xe as I'm porting over to GPUVA the first thing I'm doing after
> > drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
> > address space.
> 
> Also, since you've been starting to use the code, this [1] is the branch I'm
> pushing my fixes for a v2 to. It already contains the changes for the GPUVA
> manager except for switching away from drm_mm.
> 
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-fixes
> 

I will take a look at this branch. I believe you are on our Xe gitlab
project (working on getting this public) so you can comment on any MR I
post there, I expect to have something posted early next week to port Xe
to the gpuva.

Also I assume you are dri-devel IRC, what is your handle? Mine is
mbrost. It might be useful to chat in real time.

Matt

> > To me this seems kinda useless but maybe I'm missing why
> > you need this for Nouveau.
> > 
> > Matt
> > 
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
> > > + *
> > > + * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
> > > + * a previously allocated VA space region. If the region is sparse, existing
> > > + * sparse mappings are overwritten.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
> > > + *
> > > + * Unmap an existing mapping in the GPU's VA space. If the region the mapping
> > > + * is located in is a sparse region, new sparse mappings are created where the
> > > + * unmapped (memory backed) mapping was mapped previously.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
> > > +	/**
> > > +	 * @flags: the flags for a &drm_nouveau_vm_bind_op
> > > +	 */
> > > +	__u32 flags;
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_SPARSE:
> > > + *
> > > + * Indicates that an allocated VA space region should be sparse.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
> > > +	/**
> > > +	 * @handle: the handle of the DRM GEM object to map
> > > +	 */
> > > +	__u32 handle;
> > > +	/**
> > > +	 * @addr:
> > > +	 *
> > > +	 * the address the VA space region or (memory backed) mapping should be mapped to
> > > +	 */
> > > +	__u64 addr;
> > > +	/**
> > > +	 * @bo_offset: the offset within the BO backing the mapping
> > > +	 */
> > > +	__u64 bo_offset;
> > > +	/**
> > > +	 * @range: the size of the requested mapping in bytes
> > > +	 */
> > > +	__u64 range;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
> > > + */
> > > +struct drm_nouveau_vm_bind {
> > > +	/**
> > > +	 * @op_count: the number of &drm_nouveau_vm_bind_op
> > > +	 */
> > > +	__u32 op_count;
> > > +	/**
> > > +	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
> > > +	 */
> > > +	__u32 flags;
> > > +/**
> > > + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
> > > + *
> > > + * Indicates that the given VM_BIND operation should be executed asynchronously
> > > + * by the kernel.
> > > + *
> > > + * If this flag is not supplied the kernel executes the associated operations
> > > + * synchronously and doesn't accept any &drm_nouveau_sync objects.
> > > + */
> > > +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
> > > +	/**
> > > +	 * @wait_count: the number of wait &drm_nouveau_syncs
> > > +	 */
> > > +	__u32 wait_count;
> > > +	/**
> > > +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
> > > +	 */
> > > +	__u32 sig_count;
> > > +	/**
> > > +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
> > > +	 */
> > > +	__u64 wait_ptr;
> > > +	/**
> > > +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
> > > +	 */
> > > +	__u64 sig_ptr;
> > > +	/**
> > > +	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
> > > +	 */
> > > +	__u64 op_ptr;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_nouveau_exec_push - EXEC push operation
> > > + *
> > > + * This structure represents a single EXEC push operation. UMDs should pass an
> > > + * array of this structure via struct drm_nouveau_exec's &push_ptr field.
> > > + */
> > > +struct drm_nouveau_exec_push {
> > > +	/**
> > > +	 * @va: the virtual address of the push buffer mapping
> > > +	 */
> > > +	__u64 va;
> > > +	/**
> > > +	 * @va_len: the length of the push buffer mapping
> > > +	 */
> > > +	__u64 va_len;
> > > +};
> > > +
> > > +/**
> > > + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
> > > + */
> > > +struct drm_nouveau_exec {
> > > +	/**
> > > +	 * @channel: the channel to execute the push buffer in
> > > +	 */
> > > +	__u32 channel;
> > > +	/**
> > > +	 * @push_count: the number of &drm_nouveau_exec_push ops
> > > +	 */
> > > +	__u32 push_count;
> > > +	/**
> > > +	 * @wait_count: the number of wait &drm_nouveau_syncs
> > > +	 */
> > > +	__u32 wait_count;
> > > +	/**
> > > +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
> > > +	 */
> > > +	__u32 sig_count;
> > > +	/**
> > > +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
> > > +	 */
> > > +	__u64 wait_ptr;
> > > +	/**
> > > +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
> > > +	 */
> > > +	__u64 sig_ptr;
> > > +	/**
> > > +	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
> > > +	 */
> > > +	__u64 push_ptr;
> > > +};
> > > +
> > >   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
> > >   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
> > >   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
> > > @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
> > >   #define DRM_NOUVEAU_NVIF               0x07
> > >   #define DRM_NOUVEAU_SVM_INIT           0x08
> > >   #define DRM_NOUVEAU_SVM_BIND           0x09
> > > +#define DRM_NOUVEAU_VM_INIT            0x10
> > > +#define DRM_NOUVEAU_VM_BIND            0x11
> > > +#define DRM_NOUVEAU_EXEC               0x12
> > >   #define DRM_NOUVEAU_GEM_NEW            0x40
> > >   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
> > >   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> > > @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
> > >   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
> > >   #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
> > > +#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
> > > +#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
> > > +#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> > >   #if defined(__cplusplus)
> > >   }
> > >   #endif
> > > -- 
> > > 2.39.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  3:21       ` Matthew Brost
@ 2023-01-27  3:33         ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27  3:33 UTC (permalink / raw)
  To: Matthew Brost
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On 1/27/23 04:21, Matthew Brost wrote:
> On Fri, Jan 27, 2023 at 02:43:30AM +0100, Danilo Krummrich wrote:
>>
>>
>> On 1/27/23 02:05, Matthew Brost wrote:
>>> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>>>> This commit provides the interfaces for the new UAPI motivated by the
>>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>>
>>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>>      DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>>>>      VA area.
>>>>
>>>> 2) Bind and unbind GPU VA space mappings via the new
>>>>      DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>>
>>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>
>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>>>> asynchronous processing with DRM syncobjs as synchronization mechanism.
>>>>
>>>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>
>>>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>    Documentation/gpu/driver-uapi.rst |   8 ++
>>>>    include/uapi/drm/nouveau_drm.h    | 216 ++++++++++++++++++++++++++++++
>>>>    2 files changed, 224 insertions(+)
>>>>
>>>> diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst
>>>> index 4411e6919a3d..9c7ca6e33a68 100644
>>>> --- a/Documentation/gpu/driver-uapi.rst
>>>> +++ b/Documentation/gpu/driver-uapi.rst
>>>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>>>    =============
>>>>    .. kernel-doc:: include/uapi/drm/i915_drm.h
>>>> +
>>>> +drm/nouveau uAPI
>>>> +================
>>>> +
>>>> +VM_BIND / EXEC uAPI
>>>> +-------------------
>>>> +
>>>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>>>> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
>>>> index 853a327433d3..f6e7d40201d4 100644
>>>> --- a/include/uapi/drm/nouveau_drm.h
>>>> +++ b/include/uapi/drm/nouveau_drm.h
>>>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>>>    	__u32 handle;
>>>>    };
>>>> +/**
>>>> + * struct drm_nouveau_sync - sync object
>>>> + *
>>>> + * This structure serves as synchronization mechanism for (potentially)
>>>> + * asynchronous operations such as EXEC or VM_BIND.
>>>> + */
>>>> +struct drm_nouveau_sync {
>>>> +	/**
>>>> +	 * @flags: the flags for a sync object
>>>> +	 *
>>>> +	 * The first 8 bits are used to determine the type of the sync object.
>>>> +	 */
>>>> +	__u32 flags;
>>>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>>>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>>>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>>>> +	/**
>>>> +	 * @handle: the handle of the sync object
>>>> +	 */
>>>> +	__u32 handle;
>>>> +	/**
>>>> +	 * @timeline_value:
>>>> +	 *
>>>> +	 * The timeline point of the sync object in case the syncobj is of
>>>> +	 * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>>>> +	 */
>>>> +	__u64 timeline_value;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>>>> + *
>>>> + * Used to initialize the GPU's VA space for a user client, telling the kernel
>>>> + * which portion of the VA space is managed by the UMD and kernel respectively.
>>>> + */
>>>> +struct drm_nouveau_vm_init {
>>>> +	/**
>>>> +	 * @unmanaged_addr: start address of the kernel managed VA space region
>>>> +	 */
>>>> +	__u64 unmanaged_addr;
>>>> +	/**
>>>> +	 * @unmanaged_size: size of the kernel managed VA space region in bytes
>>>> +	 */
>>>> +	__u64 unmanaged_size;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>>>> + *
>>>> + * This structure represents a single VM_BIND operation. UMDs should pass
>>>> + * an array of this structure via struct drm_nouveau_vm_bind's &op_ptr field.
>>>> + */
>>>> +struct drm_nouveau_vm_bind_op {
>>>> +	/**
>>>> +	 * @op: the operation type
>>>> +	 */
>>>> +	__u32 op;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>>>> + *
>>>> + * The alloc operation is used to reserve a VA space region within the GPU's VA
>>>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to
>>>> + * instruct the kernel to create sparse mappings for the given region.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
>>>
>>> Do you really need this operation? We have no concept of this in Xe,
>>> e.g. we can create a VM and the entire address space is managed exactly
>>> the same.
>>>
>>> If this can be removed then the entire concept of regions in the GPUVA
>>> can be removed too (drop struct drm_gpuva_region). I say this because
>>> in Xe as I'm porting over to GPUVA the first thing I'm doing after
>>> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
>>> address space.
>>
>> Also, since you've been starting to use the code, this [1] is the branch I'm
>> pushing my fixes for a v2 to. It already contains the changes for the GPUVA
>> manager except for switching away from drm_mm.
>>
>> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-fixes
>>
> 
> I will take a look at this branch. I believe you are on our Xe gitlab
> project (working on getting this public) so you can comment on any MR I
> post there, I expect to have something posted early next week to port Xe
> to the gpuva.
> 

Yes, I am.

> Also I assume you are dri-devel IRC, what is your handle? Mine is
> mbrost. It might be useful to chat in real time.

Mine is dakr, I just pinged you in #dri-devel, but it seems your client 
timed out shortly after, so I expect it didn't reach you.

- Danilo

> 
> Matt
> 
>>> To me this seems kinda useless but maybe I'm missing why
>>> you need this for Nouveau.
>>>
>>> Matt
>>>
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>>>> + *
>>>> + * Map a GEM object to the GPU's VA space. The mapping must be fully enclosed by
>>>> + * a previously allocated VA space region. If the region is sparse, existing
>>>> + * sparse mappings are overwritten.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>>>> + *
>>>> + * Unmap an existing mapping in the GPU's VA space. If the region the mapping
>>>> + * is located in is a sparse region, new sparse mappings are created where the
>>>> + * unmapped (memory backed) mapping was mapped previously.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>>>> +	/**
>>>> +	 * @flags: the flags for a &drm_nouveau_vm_bind_op
>>>> +	 */
>>>> +	__u32 flags;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>>>> + *
>>>> + * Indicates that an allocated VA space region should be sparse.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>>>> +	/**
>>>> +	 * @handle: the handle of the DRM GEM object to map
>>>> +	 */
>>>> +	__u32 handle;
>>>> +	/**
>>>> +	 * @addr:
>>>> +	 *
>>>> +	 * the address the VA space region or (memory backed) mapping should be mapped to
>>>> +	 */
>>>> +	__u64 addr;
>>>> +	/**
>>>> +	 * @bo_offset: the offset within the BO backing the mapping
>>>> +	 */
>>>> +	__u64 bo_offset;
>>>> +	/**
>>>> +	 * @range: the size of the requested mapping in bytes
>>>> +	 */
>>>> +	__u64 range;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_bind - structure for DRM_IOCTL_NOUVEAU_VM_BIND
>>>> + */
>>>> +struct drm_nouveau_vm_bind {
>>>> +	/**
>>>> +	 * @op_count: the number of &drm_nouveau_vm_bind_op
>>>> +	 */
>>>> +	__u32 op_count;
>>>> +	/**
>>>> +	 * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>>>> +	 */
>>>> +	__u32 flags;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>>>> + *
>>>> + * Indicates that the given VM_BIND operation should be executed asynchronously
>>>> + * by the kernel.
>>>> + *
>>>> + * If this flag is not supplied the kernel executes the associated operations
>>>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>>>> +	/**
>>>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>>>> +	 */
>>>> +	__u32 wait_count;
>>>> +	/**
>>>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>>>> +	 */
>>>> +	__u32 sig_count;
>>>> +	/**
>>>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>> +	 */
>>>> +	__u64 wait_ptr;
>>>> +	/**
>>>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>>> +	 */
>>>> +	__u64 sig_ptr;
>>>> +	/**
>>>> +	 * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>>>> +	 */
>>>> +	__u64 op_ptr;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_exec_push - EXEC push operation
>>>> + *
>>>> + * This structure represents a single EXEC push operation. UMDs should pass an
>>>> + * array of this structure via struct drm_nouveau_exec's &push_ptr field.
>>>> + */
>>>> +struct drm_nouveau_exec_push {
>>>> +	/**
>>>> +	 * @va: the virtual address of the push buffer mapping
>>>> +	 */
>>>> +	__u64 va;
>>>> +	/**
>>>> +	 * @va_len: the length of the push buffer mapping
>>>> +	 */
>>>> +	__u64 va_len;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>>>> + */
>>>> +struct drm_nouveau_exec {
>>>> +	/**
>>>> +	 * @channel: the channel to execute the push buffer in
>>>> +	 */
>>>> +	__u32 channel;
>>>> +	/**
>>>> +	 * @push_count: the number of &drm_nouveau_exec_push ops
>>>> +	 */
>>>> +	__u32 push_count;
>>>> +	/**
>>>> +	 * @wait_count: the number of wait &drm_nouveau_syncs
>>>> +	 */
>>>> +	__u32 wait_count;
>>>> +	/**
>>>> +	 * @sig_count: the number of &drm_nouveau_syncs to signal when finished
>>>> +	 */
>>>> +	__u32 sig_count;
>>>> +	/**
>>>> +	 * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>> +	 */
>>>> +	__u64 wait_ptr;
>>>> +	/**
>>>> +	 * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>>> +	 */
>>>> +	__u64 sig_ptr;
>>>> +	/**
>>>> +	 * @push_ptr: pointer to &drm_nouveau_exec_push ops
>>>> +	 */
>>>> +	__u64 push_ptr;
>>>> +};
>>>> +
>>>>    #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>>>    #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>>>    #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>>>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>>>    #define DRM_NOUVEAU_NVIF               0x07
>>>>    #define DRM_NOUVEAU_SVM_INIT           0x08
>>>>    #define DRM_NOUVEAU_SVM_BIND           0x09
>>>> +#define DRM_NOUVEAU_VM_INIT            0x10
>>>> +#define DRM_NOUVEAU_VM_BIND            0x11
>>>> +#define DRM_NOUVEAU_EXEC               0x12
>>>>    #define DRM_NOUVEAU_GEM_NEW            0x40
>>>>    #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>>>    #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>>>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>>>    #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini)
>>>>    #define DRM_IOCTL_NOUVEAU_GEM_INFO           DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>>> +#define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>>> +#define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>>> +#define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>>>    #if defined(__cplusplus)
>>>>    }
>>>>    #endif
>>>> -- 
>>>> 2.39.0
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  1:26     ` Danilo Krummrich
@ 2023-01-27  7:55       ` Christian König
  2023-01-27 13:12         ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-27  7:55 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

Am 27.01.23 um 02:26 schrieb Danilo Krummrich:
> On 1/27/23 02:05, Matthew Brost wrote:
>> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>>> This commit provides the interfaces for the new UAPI motivated by the
>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>
>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>>>     VA area.
>>>
>>> 2) Bind and unbind GPU VA space mappings via the new
>>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>
>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>
>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>>> asynchronous processing with DRM syncobjs as synchronization mechanism.
>>>
>>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>
>>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>> ---
>>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>>   include/uapi/drm/nouveau_drm.h    | 216 
>>> ++++++++++++++++++++++++++++++
>>>   2 files changed, 224 insertions(+)
>>>
>>> diff --git a/Documentation/gpu/driver-uapi.rst 
>>> b/Documentation/gpu/driver-uapi.rst
>>> index 4411e6919a3d..9c7ca6e33a68 100644
>>> --- a/Documentation/gpu/driver-uapi.rst
>>> +++ b/Documentation/gpu/driver-uapi.rst
>>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>>   =============
>>>     .. kernel-doc:: include/uapi/drm/i915_drm.h
>>> +
>>> +drm/nouveau uAPI
>>> +================
>>> +
>>> +VM_BIND / EXEC uAPI
>>> +-------------------
>>> +
>>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>>> diff --git a/include/uapi/drm/nouveau_drm.h 
>>> b/include/uapi/drm/nouveau_drm.h
>>> index 853a327433d3..f6e7d40201d4 100644
>>> --- a/include/uapi/drm/nouveau_drm.h
>>> +++ b/include/uapi/drm/nouveau_drm.h
>>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>>       __u32 handle;
>>>   };
>>>   +/**
>>> + * struct drm_nouveau_sync - sync object
>>> + *
>>> + * This structure serves as synchronization mechanism for 
>>> (potentially)
>>> + * asynchronous operations such as EXEC or VM_BIND.
>>> + */
>>> +struct drm_nouveau_sync {
>>> +    /**
>>> +     * @flags: the flags for a sync object
>>> +     *
>>> +     * The first 8 bits are used to determine the type of the sync 
>>> object.
>>> +     */
>>> +    __u32 flags;
>>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>>> +    /**
>>> +     * @handle: the handle of the sync object
>>> +     */
>>> +    __u32 handle;
>>> +    /**
>>> +     * @timeline_value:
>>> +     *
>>> +     * The timeline point of the sync object in case the syncobj is of
>>> +     * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>>> +     */
>>> +    __u64 timeline_value;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>>> + *
>>> + * Used to initialize the GPU's VA space for a user client, telling 
>>> the kernel
>>> + * which portion of the VA space is managed by the UMD and kernel 
>>> respectively.
>>> + */
>>> +struct drm_nouveau_vm_init {
>>> +    /**
>>> +     * @unmanaged_addr: start address of the kernel managed VA 
>>> space region
>>> +     */
>>> +    __u64 unmanaged_addr;
>>> +    /**
>>> +     * @unmanaged_size: size of the kernel managed VA space region 
>>> in bytes
>>> +     */
>>> +    __u64 unmanaged_size;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>>> + *
>>> + * This structure represents a single VM_BIND operation. UMDs 
>>> should pass
>>> + * an array of this structure via struct drm_nouveau_vm_bind's 
>>> &op_ptr field.
>>> + */
>>> +struct drm_nouveau_vm_bind_op {
>>> +    /**
>>> +     * @op: the operation type
>>> +     */
>>> +    __u32 op;
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>>> + *
>>> + * The alloc operation is used to reserve a VA space region within 
>>> the GPU's VA
>>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be 
>>> passed to
>>> + * instruct the kernel to create sparse mappings for the given region.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
>>
>> Do you really need this operation? We have no concept of this in Xe,
>> e.g. we can create a VM and the entire address space is managed exactly
>> the same.
>
> The idea for alloc/free is to let UMDs allocate a portion of the VA 
> space (which I call a region), basically the same thing Vulkan 
> represents with a VKBuffer.

If that's mangled into the same component/interface then I can say from 
experience that this is a pretty bad idea. We have tried something 
similar with radeon and it turned out horrible.

What you want is one component for tracking the VA allocations (drm_mm 
based) and a different component/interface for tracking the VA mappings 
(probably rb tree based).

amdgpu has even gotten so far that the VA allocations are tracked in 
libdrm in userspace.

Regards,
Christian.

>
> It serves two purposes:
>
> 1. It gives the kernel (in particular the GPUVA manager) the bounds in 
> which it is allowed to merge mappings. E.g. when a user request asks 
> for a new mapping and we detect we could merge this mapping with an 
> existing one (used in another VKBuffer than the mapping request came 
> for) the driver is not allowed to change the page table for the 
> existing mapping we want to merge with (assuming that some drivers 
> would need to do this in order to merge), because the existing mapping 
> could already be in use and by re-mapping it we'd potentially cause a 
> fault on the GPU.
>
> 2. It is used for sparse residency in a way that such an allocated VA 
> space region can be flagged as sparse, such that the kernel always 
> keeps sparse mappings around for the parts of the region that do not 
> contain actual memory backed mappings.
>
> If for your driver merging is always OK, creating a single huge region 
> would do the trick I guess. Otherwise, we could also add an option to 
> the GPUVA manager (or a specific region, which could also be a single 
> huge one) within which it never merges.
>
>>
>> If this can be removed then the entire concept of regions in the GPUVA
>> can be removed too (drop struct drm_gpuva_region). I say this because
>> in Xe as I'm porting over to GPUVA the first thing I'm doing after
>> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
>> address space. To me this seems kinda useless but maybe I'm missing why
>> you need this for Nouveau.
>>
>> Matt
>>
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>>> + *
>>> + * Map a GEM object to the GPU's VA space. The mapping must be 
>>> fully enclosed by
>>> + * a previously allocated VA space region. If the region is sparse, 
>>> existing
>>> + * sparse mappings are overwritten.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>>> + *
>>> + * Unmap an existing mapping in the GPU's VA space. If the region 
>>> the mapping
>>> + * is located in is a sparse region, new sparse mappings are 
>>> created where the
>>> + * unmapped (memory backed) mapping was mapped previously.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>>> +    /**
>>> +     * @flags: the flags for a &drm_nouveau_vm_bind_op
>>> +     */
>>> +    __u32 flags;
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>>> + *
>>> + * Indicates that an allocated VA space region should be sparse.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>>> +    /**
>>> +     * @handle: the handle of the DRM GEM object to map
>>> +     */
>>> +    __u32 handle;
>>> +    /**
>>> +     * @addr:
>>> +     *
>>> +     * the address the VA space region or (memory backed) mapping 
>>> should be mapped to
>>> +     */
>>> +    __u64 addr;
>>> +    /**
>>> +     * @bo_offset: the offset within the BO backing the mapping
>>> +     */
>>> +    __u64 bo_offset;
>>> +    /**
>>> +     * @range: the size of the requested mapping in bytes
>>> +     */
>>> +    __u64 range;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_nouveau_vm_bind - structure for 
>>> DRM_IOCTL_NOUVEAU_VM_BIND
>>> + */
>>> +struct drm_nouveau_vm_bind {
>>> +    /**
>>> +     * @op_count: the number of &drm_nouveau_vm_bind_op
>>> +     */
>>> +    __u32 op_count;
>>> +    /**
>>> +     * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>>> +     */
>>> +    __u32 flags;
>>> +/**
>>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>>> + *
>>> + * Indicates that the given VM_BIND operation should be executed 
>>> asynchronously
>>> + * by the kernel.
>>> + *
>>> + * If this flag is not supplied the kernel executes the associated 
>>> operations
>>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>>> + */
>>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>>> +    /**
>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>> +     */
>>> +    __u32 wait_count;
>>> +    /**
>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal when 
>>> finished
>>> +     */
>>> +    __u32 sig_count;
>>> +    /**
>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>> +     */
>>> +    __u64 wait_ptr;
>>> +    /**
>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>> +     */
>>> +    __u64 sig_ptr;
>>> +    /**
>>> +     * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>>> +     */
>>> +    __u64 op_ptr;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_nouveau_exec_push - EXEC push operation
>>> + *
>>> + * This structure represents a single EXEC push operation. UMDs 
>>> should pass an
>>> + * array of this structure via struct drm_nouveau_exec's &push_ptr 
>>> field.
>>> + */
>>> +struct drm_nouveau_exec_push {
>>> +    /**
>>> +     * @va: the virtual address of the push buffer mapping
>>> +     */
>>> +    __u64 va;
>>> +    /**
>>> +     * @va_len: the length of the push buffer mapping
>>> +     */
>>> +    __u64 va_len;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>>> + */
>>> +struct drm_nouveau_exec {
>>> +    /**
>>> +     * @channel: the channel to execute the push buffer in
>>> +     */
>>> +    __u32 channel;
>>> +    /**
>>> +     * @push_count: the number of &drm_nouveau_exec_push ops
>>> +     */
>>> +    __u32 push_count;
>>> +    /**
>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>> +     */
>>> +    __u32 wait_count;
>>> +    /**
>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal when 
>>> finished
>>> +     */
>>> +    __u32 sig_count;
>>> +    /**
>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>> +     */
>>> +    __u64 wait_ptr;
>>> +    /**
>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>> +     */
>>> +    __u64 sig_ptr;
>>> +    /**
>>> +     * @push_ptr: pointer to &drm_nouveau_exec_push ops
>>> +     */
>>> +    __u64 push_ptr;
>>> +};
>>> +
>>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>>   #define DRM_NOUVEAU_NVIF               0x07
>>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>>> +#define DRM_NOUVEAU_VM_INIT            0x10
>>> +#define DRM_NOUVEAU_VM_BIND            0x11
>>> +#define DRM_NOUVEAU_EXEC               0x12
>>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW 
>>> (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct 
>>> drm_nouveau_gem_cpu_fini)
>>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>>   +#define DRM_IOCTL_NOUVEAU_VM_INIT DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>> +#define DRM_IOCTL_NOUVEAU_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>> +#define DRM_IOCTL_NOUVEAU_EXEC DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>>   #if defined(__cplusplus)
>>>   }
>>>   #endif
>>> -- 
>>> 2.39.0
>>>
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27  7:55       ` Christian König
@ 2023-01-27 13:12         ` Danilo Krummrich
  2023-01-27 13:23           ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27 13:12 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

On 1/27/23 08:55, Christian König wrote:
> Am 27.01.23 um 02:26 schrieb Danilo Krummrich:
>> On 1/27/23 02:05, Matthew Brost wrote:
>>> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>>>> This commit provides the interfaces for the new UAPI motivated by the
>>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>>
>>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel reserved
>>>>     VA area.
>>>>
>>>> 2) Bind and unbind GPU VA space mappings via the new
>>>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>>
>>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>
>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>>>> asynchronous processing with DRM syncobjs as synchronization mechanism.
>>>>
>>>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>
>>>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>>>   include/uapi/drm/nouveau_drm.h    | 216 
>>>> ++++++++++++++++++++++++++++++
>>>>   2 files changed, 224 insertions(+)
>>>>
>>>> diff --git a/Documentation/gpu/driver-uapi.rst 
>>>> b/Documentation/gpu/driver-uapi.rst
>>>> index 4411e6919a3d..9c7ca6e33a68 100644
>>>> --- a/Documentation/gpu/driver-uapi.rst
>>>> +++ b/Documentation/gpu/driver-uapi.rst
>>>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>>>   =============
>>>>     .. kernel-doc:: include/uapi/drm/i915_drm.h
>>>> +
>>>> +drm/nouveau uAPI
>>>> +================
>>>> +
>>>> +VM_BIND / EXEC uAPI
>>>> +-------------------
>>>> +
>>>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>>>> diff --git a/include/uapi/drm/nouveau_drm.h 
>>>> b/include/uapi/drm/nouveau_drm.h
>>>> index 853a327433d3..f6e7d40201d4 100644
>>>> --- a/include/uapi/drm/nouveau_drm.h
>>>> +++ b/include/uapi/drm/nouveau_drm.h
>>>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>>>       __u32 handle;
>>>>   };
>>>>   +/**
>>>> + * struct drm_nouveau_sync - sync object
>>>> + *
>>>> + * This structure serves as synchronization mechanism for 
>>>> (potentially)
>>>> + * asynchronous operations such as EXEC or VM_BIND.
>>>> + */
>>>> +struct drm_nouveau_sync {
>>>> +    /**
>>>> +     * @flags: the flags for a sync object
>>>> +     *
>>>> +     * The first 8 bits are used to determine the type of the sync 
>>>> object.
>>>> +     */
>>>> +    __u32 flags;
>>>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>>>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>>>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>>>> +    /**
>>>> +     * @handle: the handle of the sync object
>>>> +     */
>>>> +    __u32 handle;
>>>> +    /**
>>>> +     * @timeline_value:
>>>> +     *
>>>> +     * The timeline point of the sync object in case the syncobj is of
>>>> +     * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>>>> +     */
>>>> +    __u64 timeline_value;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>>>> + *
>>>> + * Used to initialize the GPU's VA space for a user client, telling 
>>>> the kernel
>>>> + * which portion of the VA space is managed by the UMD and kernel 
>>>> respectively.
>>>> + */
>>>> +struct drm_nouveau_vm_init {
>>>> +    /**
>>>> +     * @unmanaged_addr: start address of the kernel managed VA 
>>>> space region
>>>> +     */
>>>> +    __u64 unmanaged_addr;
>>>> +    /**
>>>> +     * @unmanaged_size: size of the kernel managed VA space region 
>>>> in bytes
>>>> +     */
>>>> +    __u64 unmanaged_size;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>>>> + *
>>>> + * This structure represents a single VM_BIND operation. UMDs 
>>>> should pass
>>>> + * an array of this structure via struct drm_nouveau_vm_bind's 
>>>> &op_ptr field.
>>>> + */
>>>> +struct drm_nouveau_vm_bind_op {
>>>> +    /**
>>>> +     * @op: the operation type
>>>> +     */
>>>> +    __u32 op;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>>>> + *
>>>> + * The alloc operation is used to reserve a VA space region within 
>>>> the GPU's VA
>>>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be 
>>>> passed to
>>>> + * instruct the kernel to create sparse mappings for the given region.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
>>>
>>> Do you really need this operation? We have no concept of this in Xe,
>>> e.g. we can create a VM and the entire address space is managed exactly
>>> the same.
>>
>> The idea for alloc/free is to let UMDs allocate a portion of the VA 
>> space (which I call a region), basically the same thing Vulkan 
>> represents with a VKBuffer.
> 
> If that's mangled into the same component/interface then I can say from 
> experience that this is a pretty bad idea. We have tried something 
> similar with radeon and it turned out horrible.

What was the exact constellation in radeon and which problems did arise 
from it?

> 
> What you want is one component for tracking the VA allocations (drm_mm 
> based) and a different component/interface for tracking the VA mappings 
> (probably rb tree based).

That's what the GPUVA manager is doing. There are gpuva_regions which 
correspond to VA allocations and gpuvas which represent the mappings. 
Both are tracked separately (currently both with a separate drm_mm, 
though). However, the GPUVA manager needs to take regions into account 
when dealing with mappings to make sure the GPUVA manager doesn't 
propose drivers to merge over region boundaries. Speaking from userspace 
PoV, the kernel wouldn't merge mappings from different VKBuffer objects 
even if they're virtually and physically contiguous.

For sparse residency the kernel also needs to know the region boundaries 
to make sure that it keeps sparse mappings around.

> 
> amdgpu has even gotten so far that the VA allocations are tracked in 
> libdrm in userspace
> 
> Regards,
> Christian.
> 
>>
>> It serves two purposes:
>>
>> 1. It gives the kernel (in particular the GPUVA manager) the bounds in 
>> which it is allowed to merge mappings. E.g. when a user request asks 
>> for a new mapping and we detect we could merge this mapping with an 
>> existing one (used in another VKBuffer than the mapping request came 
>> for) the driver is not allowed to change the page table for the 
>> existing mapping we want to merge with (assuming that some drivers 
>> would need to do this in order to merge), because the existing mapping 
>> could already be in use and by re-mapping it we'd potentially cause a 
>> fault on the GPU.
>>
>> 2. It is used for sparse residency in a way that such an allocated VA 
>> space region can be flagged as sparse, such that the kernel always 
>> keeps sparse mappings around for the parts of the region that do not 
>> contain actual memory backed mappings.
>>
>> If for your driver merging is always OK, creating a single huge region 
>> would do the trick I guess. Otherwise, we could also add an option to 
>> the GPUVA manager (or a specific region, which could also be a single 
>> huge one) within which it never merges.
>>
>>>
>>> If this can be removed then the entire concept of regions in the GPUVA
>>> can be removed too (drop struct drm_gpuva_region). I say this because
>>> in Xe as I'm porting over to GPUVA the first thing I'm doing after
>>> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the entire
>>> address space. To me this seems kinda useless but maybe I'm missing why
>>> you need this for Nouveau.
>>>
>>> Matt
>>>
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>>>> + *
>>>> + * Map a GEM object to the GPU's VA space. The mapping must be 
>>>> fully enclosed by
>>>> + * a previously allocated VA space region. If the region is sparse, 
>>>> existing
>>>> + * sparse mappings are overwritten.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>>>> + *
>>>> + * Unmap an existing mapping in the GPU's VA space. If the region 
>>>> the mapping
>>>> + * is located in is a sparse region, new sparse mappings are 
>>>> created where the
>>>> + * unmapped (memory backed) mapping was mapped previously.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>>>> +    /**
>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind_op
>>>> +     */
>>>> +    __u32 flags;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>>>> + *
>>>> + * Indicates that an allocated VA space region should be sparse.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>>>> +    /**
>>>> +     * @handle: the handle of the DRM GEM object to map
>>>> +     */
>>>> +    __u32 handle;
>>>> +    /**
>>>> +     * @addr:
>>>> +     *
>>>> +     * the address the VA space region or (memory backed) mapping 
>>>> should be mapped to
>>>> +     */
>>>> +    __u64 addr;
>>>> +    /**
>>>> +     * @bo_offset: the offset within the BO backing the mapping
>>>> +     */
>>>> +    __u64 bo_offset;
>>>> +    /**
>>>> +     * @range: the size of the requested mapping in bytes
>>>> +     */
>>>> +    __u64 range;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_vm_bind - structure for 
>>>> DRM_IOCTL_NOUVEAU_VM_BIND
>>>> + */
>>>> +struct drm_nouveau_vm_bind {
>>>> +    /**
>>>> +     * @op_count: the number of &drm_nouveau_vm_bind_op
>>>> +     */
>>>> +    __u32 op_count;
>>>> +    /**
>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>>>> +     */
>>>> +    __u32 flags;
>>>> +/**
>>>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>>>> + *
>>>> + * Indicates that the given VM_BIND operation should be executed 
>>>> asynchronously
>>>> + * by the kernel.
>>>> + *
>>>> + * If this flag is not supplied the kernel executes the associated 
>>>> operations
>>>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>>>> + */
>>>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>>>> +    /**
>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>> +     */
>>>> +    __u32 wait_count;
>>>> +    /**
>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal when 
>>>> finished
>>>> +     */
>>>> +    __u32 sig_count;
>>>> +    /**
>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>> +     */
>>>> +    __u64 wait_ptr;
>>>> +    /**
>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>>> +     */
>>>> +    __u64 sig_ptr;
>>>> +    /**
>>>> +     * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>>>> +     */
>>>> +    __u64 op_ptr;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_exec_push - EXEC push operation
>>>> + *
>>>> + * This structure represents a single EXEC push operation. UMDs 
>>>> should pass an
>>>> + * array of this structure via struct drm_nouveau_exec's &push_ptr 
>>>> field.
>>>> + */
>>>> +struct drm_nouveau_exec_push {
>>>> +    /**
>>>> +     * @va: the virtual address of the push buffer mapping
>>>> +     */
>>>> +    __u64 va;
>>>> +    /**
>>>> +     * @va_len: the length of the push buffer mapping
>>>> +     */
>>>> +    __u64 va_len;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>>>> + */
>>>> +struct drm_nouveau_exec {
>>>> +    /**
>>>> +     * @channel: the channel to execute the push buffer in
>>>> +     */
>>>> +    __u32 channel;
>>>> +    /**
>>>> +     * @push_count: the number of &drm_nouveau_exec_push ops
>>>> +     */
>>>> +    __u32 push_count;
>>>> +    /**
>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>> +     */
>>>> +    __u32 wait_count;
>>>> +    /**
>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal when 
>>>> finished
>>>> +     */
>>>> +    __u32 sig_count;
>>>> +    /**
>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>> +     */
>>>> +    __u64 wait_ptr;
>>>> +    /**
>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when finished
>>>> +     */
>>>> +    __u64 sig_ptr;
>>>> +    /**
>>>> +     * @push_ptr: pointer to &drm_nouveau_exec_push ops
>>>> +     */
>>>> +    __u64 push_ptr;
>>>> +};
>>>> +
>>>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>>>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>>>   #define DRM_NOUVEAU_NVIF               0x07
>>>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>>>> +#define DRM_NOUVEAU_VM_INIT            0x10
>>>> +#define DRM_NOUVEAU_VM_BIND            0x11
>>>> +#define DRM_NOUVEAU_EXEC               0x12
>>>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>>>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW 
>>>> (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct 
>>>> drm_nouveau_gem_cpu_fini)
>>>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>>>   +#define DRM_IOCTL_NOUVEAU_VM_INIT DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>>> +#define DRM_IOCTL_NOUVEAU_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>>> +#define DRM_IOCTL_NOUVEAU_EXEC DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>>>   #if defined(__cplusplus)
>>>>   }
>>>>   #endif
>>>> -- 
>>>> 2.39.0
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 13:12         ` Danilo Krummrich
@ 2023-01-27 13:23           ` Christian König
  2023-01-27 14:44             ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-01-27 13:23 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc



Am 27.01.23 um 14:12 schrieb Danilo Krummrich:
> On 1/27/23 08:55, Christian König wrote:
>> Am 27.01.23 um 02:26 schrieb Danilo Krummrich:
>>> On 1/27/23 02:05, Matthew Brost wrote:
>>>> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>>>>> This commit provides the interfaces for the new UAPI motivated by the
>>>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>>>
>>>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel 
>>>>> reserved
>>>>>     VA area.
>>>>>
>>>>> 2) Bind and unbind GPU VA space mappings via the new
>>>>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>>>
>>>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>>
>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>>>>> asynchronous processing with DRM syncobjs as synchronization 
>>>>> mechanism.
>>>>>
>>>>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>>
>>>>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>> ---
>>>>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>>>>   include/uapi/drm/nouveau_drm.h    | 216 
>>>>> ++++++++++++++++++++++++++++++
>>>>>   2 files changed, 224 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/gpu/driver-uapi.rst 
>>>>> b/Documentation/gpu/driver-uapi.rst
>>>>> index 4411e6919a3d..9c7ca6e33a68 100644
>>>>> --- a/Documentation/gpu/driver-uapi.rst
>>>>> +++ b/Documentation/gpu/driver-uapi.rst
>>>>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>>>>   =============
>>>>>     .. kernel-doc:: include/uapi/drm/i915_drm.h
>>>>> +
>>>>> +drm/nouveau uAPI
>>>>> +================
>>>>> +
>>>>> +VM_BIND / EXEC uAPI
>>>>> +-------------------
>>>>> +
>>>>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>>>>> diff --git a/include/uapi/drm/nouveau_drm.h 
>>>>> b/include/uapi/drm/nouveau_drm.h
>>>>> index 853a327433d3..f6e7d40201d4 100644
>>>>> --- a/include/uapi/drm/nouveau_drm.h
>>>>> +++ b/include/uapi/drm/nouveau_drm.h
>>>>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>>>>       __u32 handle;
>>>>>   };
>>>>>   +/**
>>>>> + * struct drm_nouveau_sync - sync object
>>>>> + *
>>>>> + * This structure serves as synchronization mechanism for 
>>>>> (potentially)
>>>>> + * asynchronous operations such as EXEC or VM_BIND.
>>>>> + */
>>>>> +struct drm_nouveau_sync {
>>>>> +    /**
>>>>> +     * @flags: the flags for a sync object
>>>>> +     *
>>>>> +     * The first 8 bits are used to determine the type of the 
>>>>> sync object.
>>>>> +     */
>>>>> +    __u32 flags;
>>>>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>>>>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>>>>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>>>>> +    /**
>>>>> +     * @handle: the handle of the sync object
>>>>> +     */
>>>>> +    __u32 handle;
>>>>> +    /**
>>>>> +     * @timeline_value:
>>>>> +     *
>>>>> +     * The timeline point of the sync object in case the syncobj 
>>>>> is of
>>>>> +     * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>>>>> +     */
>>>>> +    __u64 timeline_value;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>>>>> + *
>>>>> + * Used to initialize the GPU's VA space for a user client, 
>>>>> telling the kernel
>>>>> + * which portion of the VA space is managed by the UMD and kernel 
>>>>> respectively.
>>>>> + */
>>>>> +struct drm_nouveau_vm_init {
>>>>> +    /**
>>>>> +     * @unmanaged_addr: start address of the kernel managed VA 
>>>>> space region
>>>>> +     */
>>>>> +    __u64 unmanaged_addr;
>>>>> +    /**
>>>>> +     * @unmanaged_size: size of the kernel managed VA space 
>>>>> region in bytes
>>>>> +     */
>>>>> +    __u64 unmanaged_size;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>>>>> + *
>>>>> + * This structure represents a single VM_BIND operation. UMDs 
>>>>> should pass
>>>>> + * an array of this structure via struct drm_nouveau_vm_bind's 
>>>>> &op_ptr field.
>>>>> + */
>>>>> +struct drm_nouveau_vm_bind_op {
>>>>> +    /**
>>>>> +     * @op: the operation type
>>>>> +     */
>>>>> +    __u32 op;
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>>>>> + *
>>>>> + * The alloc operation is used to reserve a VA space region 
>>>>> within the GPU's VA
>>>>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be 
>>>>> passed to
>>>>> + * instruct the kernel to create sparse mappings for the given 
>>>>> region.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
>>>>
>>>> Do you really need this operation? We have no concept of this in Xe,
>>>> e.g. we can create a VM and the entire address space is managed 
>>>> exactly
>>>> the same.
>>>
>>> The idea for alloc/free is to let UMDs allocate a portion of the VA 
>>> space (which I call a region), basically the same thing Vulkan 
>>> represents with a VKBuffer.
>>
>> If that's mangled into the same component/interface then I can say 
>> from experience that this is a pretty bad idea. We have tried 
>> something similar with radeon and it turned out horrible.
>
> What was the exact constellation in radeon and which problems did 
> arise from it?
>
>>
>> What you want is one component for tracking the VA allocations 
>> (drm_mm based) and a different component/interface for tracking the 
>> VA mappings (probably rb tree based).
>
> That's what the GPUVA manager is doing. There are gpuva_regions which 
> correspond to VA allocations and gpuvas which represent the mappings. 
> Both are tracked separately (currently both with a separate drm_mm, 
> though). However, the GPUVA manager needs to take regions into account 
> when dealing with mappings to make sure the GPUVA manager doesn't 
> propose drivers to merge over region boundaries. Speaking from 
> userspace PoV, the kernel wouldn't merge mappings from different 
> VKBuffer objects even if they're virtually and physically contiguous.

That are two completely different things and shouldn't be handled in a 
single component.

We should probably talk about the design of the GPUVA manager once more 
when this should be applicable to all GPU drivers.

>
> For sparse residency the kernel also needs to know the region 
> boundaries to make sure that it keeps sparse mappings around.

What?

Regards,
Christian.

>
>>
>> amdgpu has even gotten so far that the VA allocations are tracked in 
>> libdrm in userspace
>>
>> Regards,
>> Christian.
>>
>>>
>>> It serves two purposes:
>>>
>>> 1. It gives the kernel (in particular the GPUVA manager) the bounds 
>>> in which it is allowed to merge mappings. E.g. when a user request 
>>> asks for a new mapping and we detect we could merge this mapping 
>>> with an existing one (used in another VKBuffer than the mapping 
>>> request came for) the driver is not allowed to change the page table 
>>> for the existing mapping we want to merge with (assuming that some 
>>> drivers would need to do this in order to merge), because the 
>>> existing mapping could already be in use and by re-mapping it we'd 
>>> potentially cause a fault on the GPU.
>>>
>>> 2. It is used for sparse residency in a way that such an allocated 
>>> VA space region can be flagged as sparse, such that the kernel 
>>> always keeps sparse mappings around for the parts of the region that 
>>> do not contain actual memory backed mappings.
>>>
>>> If for your driver merging is always OK, creating a single huge 
>>> region would do the trick I guess. Otherwise, we could also add an 
>>> option to the GPUVA manager (or a specific region, which could also 
>>> be a single huge one) within which it never merges.
>>>
>>>>
>>>> If this can be removed then the entire concept of regions in the GPUVA
>>>> can be removed too (drop struct drm_gpuva_region). I say this because
>>>> in Xe as I'm porting over to GPUVA the first thing I'm doing after
>>>> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the 
>>>> entire
>>>> address space. To me this seems kinda useless but maybe I'm missing 
>>>> why
>>>> you need this for Nouveau.
>>>>
>>>> Matt
>>>>
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>>>>> + *
>>>>> + * Map a GEM object to the GPU's VA space. The mapping must be 
>>>>> fully enclosed by
>>>>> + * a previously allocated VA space region. If the region is 
>>>>> sparse, existing
>>>>> + * sparse mappings are overwritten.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>>>>> + *
>>>>> + * Unmap an existing mapping in the GPU's VA space. If the region 
>>>>> the mapping
>>>>> + * is located in is a sparse region, new sparse mappings are 
>>>>> created where the
>>>>> + * unmapped (memory backed) mapping was mapped previously.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>>>>> +    /**
>>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind_op
>>>>> +     */
>>>>> +    __u32 flags;
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>>>>> + *
>>>>> + * Indicates that an allocated VA space region should be sparse.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>>>>> +    /**
>>>>> +     * @handle: the handle of the DRM GEM object to map
>>>>> +     */
>>>>> +    __u32 handle;
>>>>> +    /**
>>>>> +     * @addr:
>>>>> +     *
>>>>> +     * the address the VA space region or (memory backed) mapping 
>>>>> should be mapped to
>>>>> +     */
>>>>> +    __u64 addr;
>>>>> +    /**
>>>>> +     * @bo_offset: the offset within the BO backing the mapping
>>>>> +     */
>>>>> +    __u64 bo_offset;
>>>>> +    /**
>>>>> +     * @range: the size of the requested mapping in bytes
>>>>> +     */
>>>>> +    __u64 range;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_nouveau_vm_bind - structure for 
>>>>> DRM_IOCTL_NOUVEAU_VM_BIND
>>>>> + */
>>>>> +struct drm_nouveau_vm_bind {
>>>>> +    /**
>>>>> +     * @op_count: the number of &drm_nouveau_vm_bind_op
>>>>> +     */
>>>>> +    __u32 op_count;
>>>>> +    /**
>>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>>>>> +     */
>>>>> +    __u32 flags;
>>>>> +/**
>>>>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>>>>> + *
>>>>> + * Indicates that the given VM_BIND operation should be executed 
>>>>> asynchronously
>>>>> + * by the kernel.
>>>>> + *
>>>>> + * If this flag is not supplied the kernel executes the 
>>>>> associated operations
>>>>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>>>>> + */
>>>>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>>>>> +    /**
>>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>>> +     */
>>>>> +    __u32 wait_count;
>>>>> +    /**
>>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal 
>>>>> when finished
>>>>> +     */
>>>>> +    __u32 sig_count;
>>>>> +    /**
>>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>>> +     */
>>>>> +    __u64 wait_ptr;
>>>>> +    /**
>>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when 
>>>>> finished
>>>>> +     */
>>>>> +    __u64 sig_ptr;
>>>>> +    /**
>>>>> +     * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>>>>> +     */
>>>>> +    __u64 op_ptr;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_nouveau_exec_push - EXEC push operation
>>>>> + *
>>>>> + * This structure represents a single EXEC push operation. UMDs 
>>>>> should pass an
>>>>> + * array of this structure via struct drm_nouveau_exec's 
>>>>> &push_ptr field.
>>>>> + */
>>>>> +struct drm_nouveau_exec_push {
>>>>> +    /**
>>>>> +     * @va: the virtual address of the push buffer mapping
>>>>> +     */
>>>>> +    __u64 va;
>>>>> +    /**
>>>>> +     * @va_len: the length of the push buffer mapping
>>>>> +     */
>>>>> +    __u64 va_len;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>>>>> + */
>>>>> +struct drm_nouveau_exec {
>>>>> +    /**
>>>>> +     * @channel: the channel to execute the push buffer in
>>>>> +     */
>>>>> +    __u32 channel;
>>>>> +    /**
>>>>> +     * @push_count: the number of &drm_nouveau_exec_push ops
>>>>> +     */
>>>>> +    __u32 push_count;
>>>>> +    /**
>>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>>> +     */
>>>>> +    __u32 wait_count;
>>>>> +    /**
>>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal 
>>>>> when finished
>>>>> +     */
>>>>> +    __u32 sig_count;
>>>>> +    /**
>>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>>> +     */
>>>>> +    __u64 wait_ptr;
>>>>> +    /**
>>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when 
>>>>> finished
>>>>> +     */
>>>>> +    __u64 sig_ptr;
>>>>> +    /**
>>>>> +     * @push_ptr: pointer to &drm_nouveau_exec_push ops
>>>>> +     */
>>>>> +    __u64 push_ptr;
>>>>> +};
>>>>> +
>>>>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>>>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>>>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>>>>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>>>>   #define DRM_NOUVEAU_NVIF               0x07
>>>>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>>>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>>>>> +#define DRM_NOUVEAU_VM_INIT            0x10
>>>>> +#define DRM_NOUVEAU_VM_BIND            0x11
>>>>> +#define DRM_NOUVEAU_EXEC               0x12
>>>>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>>>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>>>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>>>>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>>>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW 
>>>>> (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct 
>>>>> drm_nouveau_gem_cpu_fini)
>>>>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>>>>   +#define DRM_IOCTL_NOUVEAU_VM_INIT DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>>>> +#define DRM_IOCTL_NOUVEAU_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>>>> +#define DRM_IOCTL_NOUVEAU_EXEC DRM_IOWR(DRM_COMMAND_BASE + 
>>>>> DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>>>>   #if defined(__cplusplus)
>>>>>   }
>>>>>   #endif
>>>>> -- 
>>>>> 2.39.0
>>>>>
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 13:23           ` Christian König
@ 2023-01-27 14:44             ` Danilo Krummrich
  2023-01-27 15:17               ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27 14:44 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

On 1/27/23 14:23, Christian König wrote:
> 
> 
> Am 27.01.23 um 14:12 schrieb Danilo Krummrich:
>> On 1/27/23 08:55, Christian König wrote:
>>> Am 27.01.23 um 02:26 schrieb Danilo Krummrich:
>>>> On 1/27/23 02:05, Matthew Brost wrote:
>>>>> On Wed, Jan 18, 2023 at 07:12:47AM +0100, Danilo Krummrich wrote:
>>>>>> This commit provides the interfaces for the new UAPI motivated by the
>>>>>> Vulkan API. It allows user mode drivers (UMDs) to:
>>>>>>
>>>>>> 1) Initialize a GPU virtual address (VA) space via the new
>>>>>>     DRM_IOCTL_NOUVEAU_VM_INIT ioctl. UMDs can provide a kernel 
>>>>>> reserved
>>>>>>     VA area.
>>>>>>
>>>>>> 2) Bind and unbind GPU VA space mappings via the new
>>>>>>     DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>>>>>
>>>>>> 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>>>>>
>>>>>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC support
>>>>>> asynchronous processing with DRM syncobjs as synchronization 
>>>>>> mechanism.
>>>>>>
>>>>>> The default DRM_IOCTL_NOUVEAU_VM_BIND is synchronous processing,
>>>>>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>>>>>
>>>>>> Co-authored-by: Dave Airlie <airlied@redhat.com>
>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>> ---
>>>>>>   Documentation/gpu/driver-uapi.rst |   8 ++
>>>>>>   include/uapi/drm/nouveau_drm.h    | 216 
>>>>>> ++++++++++++++++++++++++++++++
>>>>>>   2 files changed, 224 insertions(+)
>>>>>>
>>>>>> diff --git a/Documentation/gpu/driver-uapi.rst 
>>>>>> b/Documentation/gpu/driver-uapi.rst
>>>>>> index 4411e6919a3d..9c7ca6e33a68 100644
>>>>>> --- a/Documentation/gpu/driver-uapi.rst
>>>>>> +++ b/Documentation/gpu/driver-uapi.rst
>>>>>> @@ -6,3 +6,11 @@ drm/i915 uAPI
>>>>>>   =============
>>>>>>     .. kernel-doc:: include/uapi/drm/i915_drm.h
>>>>>> +
>>>>>> +drm/nouveau uAPI
>>>>>> +================
>>>>>> +
>>>>>> +VM_BIND / EXEC uAPI
>>>>>> +-------------------
>>>>>> +
>>>>>> +.. kernel-doc:: include/uapi/drm/nouveau_drm.h
>>>>>> diff --git a/include/uapi/drm/nouveau_drm.h 
>>>>>> b/include/uapi/drm/nouveau_drm.h
>>>>>> index 853a327433d3..f6e7d40201d4 100644
>>>>>> --- a/include/uapi/drm/nouveau_drm.h
>>>>>> +++ b/include/uapi/drm/nouveau_drm.h
>>>>>> @@ -126,6 +126,216 @@ struct drm_nouveau_gem_cpu_fini {
>>>>>>       __u32 handle;
>>>>>>   };
>>>>>>   +/**
>>>>>> + * struct drm_nouveau_sync - sync object
>>>>>> + *
>>>>>> + * This structure serves as synchronization mechanism for 
>>>>>> (potentially)
>>>>>> + * asynchronous operations such as EXEC or VM_BIND.
>>>>>> + */
>>>>>> +struct drm_nouveau_sync {
>>>>>> +    /**
>>>>>> +     * @flags: the flags for a sync object
>>>>>> +     *
>>>>>> +     * The first 8 bits are used to determine the type of the 
>>>>>> sync object.
>>>>>> +     */
>>>>>> +    __u32 flags;
>>>>>> +#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0
>>>>>> +#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1
>>>>>> +#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf
>>>>>> +    /**
>>>>>> +     * @handle: the handle of the sync object
>>>>>> +     */
>>>>>> +    __u32 handle;
>>>>>> +    /**
>>>>>> +     * @timeline_value:
>>>>>> +     *
>>>>>> +     * The timeline point of the sync object in case the syncobj 
>>>>>> is of
>>>>>> +     * type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.
>>>>>> +     */
>>>>>> +    __u64 timeline_value;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_nouveau_vm_init - GPU VA space init structure
>>>>>> + *
>>>>>> + * Used to initialize the GPU's VA space for a user client, 
>>>>>> telling the kernel
>>>>>> + * which portion of the VA space is managed by the UMD and kernel 
>>>>>> respectively.
>>>>>> + */
>>>>>> +struct drm_nouveau_vm_init {
>>>>>> +    /**
>>>>>> +     * @unmanaged_addr: start address of the kernel managed VA 
>>>>>> space region
>>>>>> +     */
>>>>>> +    __u64 unmanaged_addr;
>>>>>> +    /**
>>>>>> +     * @unmanaged_size: size of the kernel managed VA space 
>>>>>> region in bytes
>>>>>> +     */
>>>>>> +    __u64 unmanaged_size;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_nouveau_vm_bind_op - VM_BIND operation
>>>>>> + *
>>>>>> + * This structure represents a single VM_BIND operation. UMDs 
>>>>>> should pass
>>>>>> + * an array of this structure via struct drm_nouveau_vm_bind's 
>>>>>> &op_ptr field.
>>>>>> + */
>>>>>> +struct drm_nouveau_vm_bind_op {
>>>>>> +    /**
>>>>>> +     * @op: the operation type
>>>>>> +     */
>>>>>> +    __u32 op;
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_ALLOC:
>>>>>> + *
>>>>>> + * The alloc operation is used to reserve a VA space region 
>>>>>> within the GPU's VA
>>>>>> + * space. Optionally, the &DRM_NOUVEAU_VM_BIND_SPARSE flag can be 
>>>>>> passed to
>>>>>> + * instruct the kernel to create sparse mappings for the given 
>>>>>> region.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_ALLOC 0x0
>>>>>
>>>>> Do you really need this operation? We have no concept of this in Xe,
>>>>> e.g. we can create a VM and the entire address space is managed 
>>>>> exactly
>>>>> the same.
>>>>
>>>> The idea for alloc/free is to let UMDs allocate a portion of the VA 
>>>> space (which I call a region), basically the same thing Vulkan 
>>>> represents with a VKBuffer.
>>>
>>> If that's mangled into the same component/interface then I can say 
>>> from experience that this is a pretty bad idea. We have tried 
>>> something similar with radeon and it turned out horrible.
>>
>> What was the exact constellation in radeon and which problems did 
>> arise from it?
>>
>>>
>>> What you want is one component for tracking the VA allocations 
>>> (drm_mm based) and a different component/interface for tracking the 
>>> VA mappings (probably rb tree based).
>>
>> That's what the GPUVA manager is doing. There are gpuva_regions which 
>> correspond to VA allocations and gpuvas which represent the mappings. 
>> Both are tracked separately (currently both with a separate drm_mm, 
>> though). However, the GPUVA manager needs to take regions into account 
>> when dealing with mappings to make sure the GPUVA manager doesn't 
>> propose drivers to merge over region boundaries. Speaking from 
>> userspace PoV, the kernel wouldn't merge mappings from different 
>> VKBuffer objects even if they're virtually and physically contiguous.
> 
> That are two completely different things and shouldn't be handled in a 
> single component.

They are different things, but they're related in a way that for 
handling the mappings (in particular merging and sparse) the GPUVA 
manager needs to know the VA allocation (or region) boundaries.

I have the feeling there might be a misunderstanding. Userspace is in 
charge to actually allocate a portion of VA space and manage it. The 
GPUVA manager just needs to know about those VA space allocations and 
hence keeps track of them.

The GPUVA manager is not meant to be an allocator in the sense of 
finding and providing a hole for a given request.

Maybe the non-ideal choice of using drm_mm was implying something else.

> 
> We should probably talk about the design of the GPUVA manager once more 
> when this should be applicable to all GPU drivers.

That's what I try to figure out with this RFC, how to make it appicable 
for all GPU drivers, so I'm happy to discuss this. :-)

> 
>>
>> For sparse residency the kernel also needs to know the region 
>> boundaries to make sure that it keeps sparse mappings around.
> 
> What?

When userspace creates a new VKBuffer with the 
VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create sparse 
mappings in order to ensure that using this buffer without any memory 
backed mappings doesn't fault the GPU.

Currently, the implementation does this the following way:

1. Userspace creates a new VKBuffer and hence allocates a portion of the 
VA space for it. It calls into the kernel indicating the new VA space 
region and the fact that the region is sparse.

2. The kernel picks up the region and stores it in the GPUVA manager, 
the driver creates the corresponding sparse mappings / page table entries.

3. Userspace might ask the driver to create a couple of memory backed 
mappings for this particular VA region. The GPUVA manager stores the 
mapping parameters, the driver creates the corresponding page table entries.

4. Userspace might ask to unmap all the memory backed mappings from this 
particular VA region. The GPUVA manager removes the mapping parameters, 
the driver cleans up the corresponding page table entries. However, the 
driver also needs to re-create the sparse mappings, since it's a sparse 
buffer, hence it needs to know the boundaries of the region it needs to 
create the sparse mappings in.

> 
> Regards,
> Christian.
> 
>>
>>>
>>> amdgpu has even gotten so far that the VA allocations are tracked in 
>>> libdrm in userspace
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> It serves two purposes:
>>>>
>>>> 1. It gives the kernel (in particular the GPUVA manager) the bounds 
>>>> in which it is allowed to merge mappings. E.g. when a user request 
>>>> asks for a new mapping and we detect we could merge this mapping 
>>>> with an existing one (used in another VKBuffer than the mapping 
>>>> request came for) the driver is not allowed to change the page table 
>>>> for the existing mapping we want to merge with (assuming that some 
>>>> drivers would need to do this in order to merge), because the 
>>>> existing mapping could already be in use and by re-mapping it we'd 
>>>> potentially cause a fault on the GPU.
>>>>
>>>> 2. It is used for sparse residency in a way that such an allocated 
>>>> VA space region can be flagged as sparse, such that the kernel 
>>>> always keeps sparse mappings around for the parts of the region that 
>>>> do not contain actual memory backed mappings.
>>>>
>>>> If for your driver merging is always OK, creating a single huge 
>>>> region would do the trick I guess. Otherwise, we could also add an 
>>>> option to the GPUVA manager (or a specific region, which could also 
>>>> be a single huge one) within which it never merges.
>>>>
>>>>>
>>>>> If this can be removed then the entire concept of regions in the GPUVA
>>>>> can be removed too (drop struct drm_gpuva_region). I say this because
>>>>> in Xe as I'm porting over to GPUVA the first thing I'm doing after
>>>>> drm_gpuva_manager_init is calling drm_gpuva_region_insert on the 
>>>>> entire
>>>>> address space. To me this seems kinda useless but maybe I'm missing 
>>>>> why
>>>>> you need this for Nouveau.
>>>>>
>>>>> Matt
>>>>>
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_FREE: Free a reserved VA space region.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_FREE 0x1
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_MAP:
>>>>>> + *
>>>>>> + * Map a GEM object to the GPU's VA space. The mapping must be 
>>>>>> fully enclosed by
>>>>>> + * a previously allocated VA space region. If the region is 
>>>>>> sparse, existing
>>>>>> + * sparse mappings are overwritten.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x2
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_OP_UNMAP:
>>>>>> + *
>>>>>> + * Unmap an existing mapping in the GPU's VA space. If the region 
>>>>>> the mapping
>>>>>> + * is located in is a sparse region, new sparse mappings are 
>>>>>> created where the
>>>>>> + * unmapped (memory backed) mapping was mapped previously.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x3
>>>>>> +    /**
>>>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind_op
>>>>>> +     */
>>>>>> +    __u32 flags;
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_SPARSE:
>>>>>> + *
>>>>>> + * Indicates that an allocated VA space region should be sparse.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8)
>>>>>> +    /**
>>>>>> +     * @handle: the handle of the DRM GEM object to map
>>>>>> +     */
>>>>>> +    __u32 handle;
>>>>>> +    /**
>>>>>> +     * @addr:
>>>>>> +     *
>>>>>> +     * the address the VA space region or (memory backed) mapping 
>>>>>> should be mapped to
>>>>>> +     */
>>>>>> +    __u64 addr;
>>>>>> +    /**
>>>>>> +     * @bo_offset: the offset within the BO backing the mapping
>>>>>> +     */
>>>>>> +    __u64 bo_offset;
>>>>>> +    /**
>>>>>> +     * @range: the size of the requested mapping in bytes
>>>>>> +     */
>>>>>> +    __u64 range;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_nouveau_vm_bind - structure for 
>>>>>> DRM_IOCTL_NOUVEAU_VM_BIND
>>>>>> + */
>>>>>> +struct drm_nouveau_vm_bind {
>>>>>> +    /**
>>>>>> +     * @op_count: the number of &drm_nouveau_vm_bind_op
>>>>>> +     */
>>>>>> +    __u32 op_count;
>>>>>> +    /**
>>>>>> +     * @flags: the flags for a &drm_nouveau_vm_bind ioctl
>>>>>> +     */
>>>>>> +    __u32 flags;
>>>>>> +/**
>>>>>> + * @DRM_NOUVEAU_VM_BIND_RUN_ASYNC:
>>>>>> + *
>>>>>> + * Indicates that the given VM_BIND operation should be executed 
>>>>>> asynchronously
>>>>>> + * by the kernel.
>>>>>> + *
>>>>>> + * If this flag is not supplied the kernel executes the 
>>>>>> associated operations
>>>>>> + * synchronously and doesn't accept any &drm_nouveau_sync objects.
>>>>>> + */
>>>>>> +#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1
>>>>>> +    /**
>>>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>>>> +     */
>>>>>> +    __u32 wait_count;
>>>>>> +    /**
>>>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal 
>>>>>> when finished
>>>>>> +     */
>>>>>> +    __u32 sig_count;
>>>>>> +    /**
>>>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>>>> +     */
>>>>>> +    __u64 wait_ptr;
>>>>>> +    /**
>>>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when 
>>>>>> finished
>>>>>> +     */
>>>>>> +    __u64 sig_ptr;
>>>>>> +    /**
>>>>>> +     * @op_ptr: pointer to the &drm_nouveau_vm_bind_ops to execute
>>>>>> +     */
>>>>>> +    __u64 op_ptr;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_nouveau_exec_push - EXEC push operation
>>>>>> + *
>>>>>> + * This structure represents a single EXEC push operation. UMDs 
>>>>>> should pass an
>>>>>> + * array of this structure via struct drm_nouveau_exec's 
>>>>>> &push_ptr field.
>>>>>> + */
>>>>>> +struct drm_nouveau_exec_push {
>>>>>> +    /**
>>>>>> +     * @va: the virtual address of the push buffer mapping
>>>>>> +     */
>>>>>> +    __u64 va;
>>>>>> +    /**
>>>>>> +     * @va_len: the length of the push buffer mapping
>>>>>> +     */
>>>>>> +    __u64 va_len;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_nouveau_exec - structure for DRM_IOCTL_NOUVEAU_EXEC
>>>>>> + */
>>>>>> +struct drm_nouveau_exec {
>>>>>> +    /**
>>>>>> +     * @channel: the channel to execute the push buffer in
>>>>>> +     */
>>>>>> +    __u32 channel;
>>>>>> +    /**
>>>>>> +     * @push_count: the number of &drm_nouveau_exec_push ops
>>>>>> +     */
>>>>>> +    __u32 push_count;
>>>>>> +    /**
>>>>>> +     * @wait_count: the number of wait &drm_nouveau_syncs
>>>>>> +     */
>>>>>> +    __u32 wait_count;
>>>>>> +    /**
>>>>>> +     * @sig_count: the number of &drm_nouveau_syncs to signal 
>>>>>> when finished
>>>>>> +     */
>>>>>> +    __u32 sig_count;
>>>>>> +    /**
>>>>>> +     * @wait_ptr: pointer to &drm_nouveau_syncs to wait for
>>>>>> +     */
>>>>>> +    __u64 wait_ptr;
>>>>>> +    /**
>>>>>> +     * @sig_ptr: pointer to &drm_nouveau_syncs to signal when 
>>>>>> finished
>>>>>> +     */
>>>>>> +    __u64 sig_ptr;
>>>>>> +    /**
>>>>>> +     * @push_ptr: pointer to &drm_nouveau_exec_push ops
>>>>>> +     */
>>>>>> +    __u64 push_ptr;
>>>>>> +};
>>>>>> +
>>>>>>   #define DRM_NOUVEAU_GETPARAM           0x00 /* deprecated */
>>>>>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>>>>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02 /* deprecated */
>>>>>> @@ -136,6 +346,9 @@ struct drm_nouveau_gem_cpu_fini {
>>>>>>   #define DRM_NOUVEAU_NVIF               0x07
>>>>>>   #define DRM_NOUVEAU_SVM_INIT           0x08
>>>>>>   #define DRM_NOUVEAU_SVM_BIND           0x09
>>>>>> +#define DRM_NOUVEAU_VM_INIT            0x10
>>>>>> +#define DRM_NOUVEAU_VM_BIND            0x11
>>>>>> +#define DRM_NOUVEAU_EXEC               0x12
>>>>>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>>>>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>>>>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>>>>>> @@ -197,6 +410,9 @@ struct drm_nouveau_svm_bind {
>>>>>>   #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI       DRM_IOW 
>>>>>> (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct 
>>>>>> drm_nouveau_gem_cpu_fini)
>>>>>>   #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>> DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info)
>>>>>>   +#define DRM_IOCTL_NOUVEAU_VM_INIT DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>> DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>>>>> +#define DRM_IOCTL_NOUVEAU_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>> DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>>>>> +#define DRM_IOCTL_NOUVEAU_EXEC DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>> DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>>>>>>   #if defined(__cplusplus)
>>>>>>   }
>>>>>>   #endif
>>>>>> -- 
>>>>>> 2.39.0
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 14:44             ` Danilo Krummrich
@ 2023-01-27 15:17               ` Christian König
  2023-01-27 20:25                 ` David Airlie
  2023-01-27 21:09                 ` Danilo Krummrich
  0 siblings, 2 replies; 75+ messages in thread
From: Christian König @ 2023-01-27 15:17 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
> [SNIP]
>>>>
>>>> What you want is one component for tracking the VA allocations 
>>>> (drm_mm based) and a different component/interface for tracking the 
>>>> VA mappings (probably rb tree based).
>>>
>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>> which correspond to VA allocations and gpuvas which represent the 
>>> mappings. Both are tracked separately (currently both with a 
>>> separate drm_mm, though). However, the GPUVA manager needs to take 
>>> regions into account when dealing with mappings to make sure the 
>>> GPUVA manager doesn't propose drivers to merge over region 
>>> boundaries. Speaking from userspace PoV, the kernel wouldn't merge 
>>> mappings from different VKBuffer objects even if they're virtually 
>>> and physically contiguous.
>>
>> That are two completely different things and shouldn't be handled in 
>> a single component.
>
> They are different things, but they're related in a way that for 
> handling the mappings (in particular merging and sparse) the GPUVA 
> manager needs to know the VA allocation (or region) boundaries.
>
> I have the feeling there might be a misunderstanding. Userspace is in 
> charge to actually allocate a portion of VA space and manage it. The 
> GPUVA manager just needs to know about those VA space allocations and 
> hence keeps track of them.
>
> The GPUVA manager is not meant to be an allocator in the sense of 
> finding and providing a hole for a given request.
>
> Maybe the non-ideal choice of using drm_mm was implying something else.

Uff, well long story short that doesn't even remotely match the 
requirements. This way the GPUVA manager won't be usable for a whole 
bunch of use cases.

What we have are mappings which say X needs to point to Y with this and 
hw dependent flags.

The whole idea of having ranges is not going to fly. Neither with AMD 
GPUs and I strongly think not with Intels XA either.

>> We should probably talk about the design of the GPUVA manager once 
>> more when this should be applicable to all GPU drivers.
>
> That's what I try to figure out with this RFC, how to make it 
> appicable for all GPU drivers, so I'm happy to discuss this. :-)

Yeah, that was really good idea :) That proposal here is really far away 
from the actual requirements.

>>> For sparse residency the kernel also needs to know the region 
>>> boundaries to make sure that it keeps sparse mappings around.
>>
>> What?
>
> When userspace creates a new VKBuffer with the 
> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
> sparse mappings in order to ensure that using this buffer without any 
> memory backed mappings doesn't fault the GPU.
>
> Currently, the implementation does this the following way:
>
> 1. Userspace creates a new VKBuffer and hence allocates a portion of 
> the VA space for it. It calls into the kernel indicating the new VA 
> space region and the fact that the region is sparse.
>
> 2. The kernel picks up the region and stores it in the GPUVA manager, 
> the driver creates the corresponding sparse mappings / page table 
> entries.
>
> 3. Userspace might ask the driver to create a couple of memory backed 
> mappings for this particular VA region. The GPUVA manager stores the 
> mapping parameters, the driver creates the corresponding page table 
> entries.
>
> 4. Userspace might ask to unmap all the memory backed mappings from 
> this particular VA region. The GPUVA manager removes the mapping 
> parameters, the driver cleans up the corresponding page table entries. 
> However, the driver also needs to re-create the sparse mappings, since 
> it's a sparse buffer, hence it needs to know the boundaries of the 
> region it needs to create the sparse mappings in.

Again, this is not how things are working. First of all the kernel 
absolutely should *NOT* know about those regions.

What we have inside the kernel is the information what happens if an 
address X is accessed. On AMD HW this can be:

1. Route to the PCIe bus because the mapped BO is stored in system memory.
2. Route to the internal MC because the mapped BO is stored in local memory.
3. Route to other GPUs in the same hive.
4. Route to some doorbell to kick of other work.
...
x. Ignore write, return 0 on reads (this is what is used for sparse 
mappings).
x+1. Trigger a recoverable page fault. This is used for things like SVA.
x+2. Trigger a non-recoverable page fault. This is used for things like 
unmapped regions where access is illegal.

All this is plus some hw specific caching flags.

When Vulkan allocates a sparse VKBuffer what should happen is the following:

1. The Vulkan driver somehow figures out a VA region A..B for the 
buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), but 
essentially is currently driver specific.

2. The kernel gets a request to map the VA range A..B as sparse, meaning 
that it updates the page tables from A..B with the sparse setting.

3. User space asks kernel to map a couple of memory backings at location 
A+1, A+10, A+15 etc....

4. The VKBuffer is de-allocated, userspace asks kernel to update region 
A..B to not map anything (usually triggers a non-recoverable fault).

When you want to unify this between hw drivers I strongly suggest to 
completely start from scratch once more.

First of all don't think about those mappings as VMAs, that won't work 
because VMAs are usually something large. Think of this as individual 
PTEs controlled by the application. similar how COW mappings and struct 
pages are handled inside the kernel.

Then I would start with the VA allocation manager. You could probably 
base that on drm_mm. We handle it differently in amdgpu currently, but I 
think this is something we could change.

Then come up with something close to the amdgpu VM system. I'm pretty 
sure that should work for Nouveau and Intel XA as well. In other words 
you just have a bunch of very very small structures which represents 
mappings and a larger structure which combine all mappings of a specific 
type, e.g. all mappings of a BO or all sparse mappings etc...

Merging of regions is actually not mandatory. We don't do it in amdgpu 
and can live with the additional mappings pretty well. But I think this 
can differ between drivers.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 15:17               ` Christian König
@ 2023-01-27 20:25                 ` David Airlie
  2023-01-30 12:58                   ` Christian König
  2023-01-27 21:09                 ` Danilo Krummrich
  1 sibling, 1 reply; 75+ messages in thread
From: David Airlie @ 2023-01-27 20:25 UTC (permalink / raw)
  To: Christian König
  Cc: Danilo Krummrich, Matthew Brost, daniel, bskeggs, jason,
	tzimmermann, mripard, corbet, nouveau, linux-kernel, dri-devel,
	linux-doc

On Sat, Jan 28, 2023 at 1:17 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
> > [SNIP]
> >>>>
> >>>> What you want is one component for tracking the VA allocations
> >>>> (drm_mm based) and a different component/interface for tracking the
> >>>> VA mappings (probably rb tree based).
> >>>
> >>> That's what the GPUVA manager is doing. There are gpuva_regions
> >>> which correspond to VA allocations and gpuvas which represent the
> >>> mappings. Both are tracked separately (currently both with a
> >>> separate drm_mm, though). However, the GPUVA manager needs to take
> >>> regions into account when dealing with mappings to make sure the
> >>> GPUVA manager doesn't propose drivers to merge over region
> >>> boundaries. Speaking from userspace PoV, the kernel wouldn't merge
> >>> mappings from different VKBuffer objects even if they're virtually
> >>> and physically contiguous.
> >>
> >> That are two completely different things and shouldn't be handled in
> >> a single component.
> >
> > They are different things, but they're related in a way that for
> > handling the mappings (in particular merging and sparse) the GPUVA
> > manager needs to know the VA allocation (or region) boundaries.
> >
> > I have the feeling there might be a misunderstanding. Userspace is in
> > charge to actually allocate a portion of VA space and manage it. The
> > GPUVA manager just needs to know about those VA space allocations and
> > hence keeps track of them.
> >
> > The GPUVA manager is not meant to be an allocator in the sense of
> > finding and providing a hole for a given request.
> >
> > Maybe the non-ideal choice of using drm_mm was implying something else.
>
> Uff, well long story short that doesn't even remotely match the
> requirements. This way the GPUVA manager won't be usable for a whole
> bunch of use cases.
>
> What we have are mappings which say X needs to point to Y with this and
> hw dependent flags.
>
> The whole idea of having ranges is not going to fly. Neither with AMD
> GPUs and I strongly think not with Intels XA either.
>
> >> We should probably talk about the design of the GPUVA manager once
> >> more when this should be applicable to all GPU drivers.
> >
> > That's what I try to figure out with this RFC, how to make it
> > appicable for all GPU drivers, so I'm happy to discuss this. :-)
>
> Yeah, that was really good idea :) That proposal here is really far away
> from the actual requirements.
>
> >>> For sparse residency the kernel also needs to know the region
> >>> boundaries to make sure that it keeps sparse mappings around.
> >>
> >> What?
> >
> > When userspace creates a new VKBuffer with the
> > VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create
> > sparse mappings in order to ensure that using this buffer without any
> > memory backed mappings doesn't fault the GPU.
> >
> > Currently, the implementation does this the following way:
> >
> > 1. Userspace creates a new VKBuffer and hence allocates a portion of
> > the VA space for it. It calls into the kernel indicating the new VA
> > space region and the fact that the region is sparse.
> >
> > 2. The kernel picks up the region and stores it in the GPUVA manager,
> > the driver creates the corresponding sparse mappings / page table
> > entries.
> >
> > 3. Userspace might ask the driver to create a couple of memory backed
> > mappings for this particular VA region. The GPUVA manager stores the
> > mapping parameters, the driver creates the corresponding page table
> > entries.
> >
> > 4. Userspace might ask to unmap all the memory backed mappings from
> > this particular VA region. The GPUVA manager removes the mapping
> > parameters, the driver cleans up the corresponding page table entries.
> > However, the driver also needs to re-create the sparse mappings, since
> > it's a sparse buffer, hence it needs to know the boundaries of the
> > region it needs to create the sparse mappings in.
>
> Again, this is not how things are working. First of all the kernel
> absolutely should *NOT* know about those regions.
>
> What we have inside the kernel is the information what happens if an
> address X is accessed. On AMD HW this can be:
>
> 1. Route to the PCIe bus because the mapped BO is stored in system memory.
> 2. Route to the internal MC because the mapped BO is stored in local memory.
> 3. Route to other GPUs in the same hive.
> 4. Route to some doorbell to kick of other work.
> ...
> x. Ignore write, return 0 on reads (this is what is used for sparse
> mappings).
> x+1. Trigger a recoverable page fault. This is used for things like SVA.
> x+2. Trigger a non-recoverable page fault. This is used for things like
> unmapped regions where access is illegal.
>
> All this is plus some hw specific caching flags.
>
> When Vulkan allocates a sparse VKBuffer what should happen is the following:
>
> 1. The Vulkan driver somehow figures out a VA region A..B for the
> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), but
> essentially is currently driver specific.

There are NO plans to have drm_mm do VA region management, VA region
management will be in userspace in Mesa. Can we just not bring that up again?
This is for GPU VA tracking not management if that makes it easier we
could rename it.

>
> 2. The kernel gets a request to map the VA range A..B as sparse, meaning
> that it updates the page tables from A..B with the sparse setting.
>
> 3. User space asks kernel to map a couple of memory backings at location
> A+1, A+10, A+15 etc....

3.5?

Userspace asks the kernel to unmap A+1 so it can later map something
else in there?

What happens in that case, with a set of queued binds, do you just do
a new sparse mapping for A+1, does userspace decide that?

Dave.

>
> 4. The VKBuffer is de-allocated, userspace asks kernel to update region
> A..B to not map anything (usually triggers a non-recoverable fault).
>
> When you want to unify this between hw drivers I strongly suggest to
> completely start from scratch once more.
>
> First of all don't think about those mappings as VMAs, that won't work
> because VMAs are usually something large. Think of this as individual
> PTEs controlled by the application. similar how COW mappings and struct
> pages are handled inside the kernel.
>
> Then I would start with the VA allocation manager. You could probably
> base that on drm_mm. We handle it differently in amdgpu currently, but I
> think this is something we could change.
>
> Then come up with something close to the amdgpu VM system. I'm pretty
> sure that should work for Nouveau and Intel XA as well. In other words
> you just have a bunch of very very small structures which represents
> mappings and a larger structure which combine all mappings of a specific
> type, e.g. all mappings of a BO or all sparse mappings etc...
>
> Merging of regions is actually not mandatory. We don't do it in amdgpu
> and can live with the additional mappings pretty well. But I think this
> can differ between drivers.
>
> Regards,
> Christian.
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 15:17               ` Christian König
  2023-01-27 20:25                 ` David Airlie
@ 2023-01-27 21:09                 ` Danilo Krummrich
  2023-01-29 18:46                   ` Danilo Krummrich
  1 sibling, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-27 21:09 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

On 1/27/23 16:17, Christian König wrote:
> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>> [SNIP]
>>>>>
>>>>> What you want is one component for tracking the VA allocations 
>>>>> (drm_mm based) and a different component/interface for tracking the 
>>>>> VA mappings (probably rb tree based).
>>>>
>>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>>> which correspond to VA allocations and gpuvas which represent the 
>>>> mappings. Both are tracked separately (currently both with a 
>>>> separate drm_mm, though). However, the GPUVA manager needs to take 
>>>> regions into account when dealing with mappings to make sure the 
>>>> GPUVA manager doesn't propose drivers to merge over region 
>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't merge 
>>>> mappings from different VKBuffer objects even if they're virtually 
>>>> and physically contiguous.
>>>
>>> That are two completely different things and shouldn't be handled in 
>>> a single component.
>>
>> They are different things, but they're related in a way that for 
>> handling the mappings (in particular merging and sparse) the GPUVA 
>> manager needs to know the VA allocation (or region) boundaries.
>>
>> I have the feeling there might be a misunderstanding. Userspace is in 
>> charge to actually allocate a portion of VA space and manage it. The 
>> GPUVA manager just needs to know about those VA space allocations and 
>> hence keeps track of them.
>>
>> The GPUVA manager is not meant to be an allocator in the sense of 
>> finding and providing a hole for a given request.
>>
>> Maybe the non-ideal choice of using drm_mm was implying something else.
> 
> Uff, well long story short that doesn't even remotely match the 
> requirements. This way the GPUVA manager won't be usable for a whole 
> bunch of use cases.
> 
> What we have are mappings which say X needs to point to Y with this and 
> hw dependent flags.
> 
> The whole idea of having ranges is not going to fly. Neither with AMD 
> GPUs and I strongly think not with Intels XA either.

A range in the sense of the GPUVA manager simply represents a VA space 
allocation (which in case of Nouveau is taken in userspace). Userspace 
allocates the portion of VA space and lets the kernel know about it. The 
current implementation needs that for the named reasons. So, I think 
there is no reason why this would work with one GPU, but not with 
another. It's just part of the design choice of the manager.

And I'm absolutely happy to discuss the details of the manager 
implementation though.

> 
>>> We should probably talk about the design of the GPUVA manager once 
>>> more when this should be applicable to all GPU drivers.
>>
>> That's what I try to figure out with this RFC, how to make it 
>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
> 
> Yeah, that was really good idea :) That proposal here is really far away 
> from the actual requirements.
> 

And those are the ones I'm looking for. Do you mind sharing the 
requirements for amdgpu in particular?

>>>> For sparse residency the kernel also needs to know the region 
>>>> boundaries to make sure that it keeps sparse mappings around.
>>>
>>> What?
>>
>> When userspace creates a new VKBuffer with the 
>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>> sparse mappings in order to ensure that using this buffer without any 
>> memory backed mappings doesn't fault the GPU.
>>
>> Currently, the implementation does this the following way:
>>
>> 1. Userspace creates a new VKBuffer and hence allocates a portion of 
>> the VA space for it. It calls into the kernel indicating the new VA 
>> space region and the fact that the region is sparse.
>>
>> 2. The kernel picks up the region and stores it in the GPUVA manager, 
>> the driver creates the corresponding sparse mappings / page table 
>> entries.
>>
>> 3. Userspace might ask the driver to create a couple of memory backed 
>> mappings for this particular VA region. The GPUVA manager stores the 
>> mapping parameters, the driver creates the corresponding page table 
>> entries.
>>
>> 4. Userspace might ask to unmap all the memory backed mappings from 
>> this particular VA region. The GPUVA manager removes the mapping 
>> parameters, the driver cleans up the corresponding page table entries. 
>> However, the driver also needs to re-create the sparse mappings, since 
>> it's a sparse buffer, hence it needs to know the boundaries of the 
>> region it needs to create the sparse mappings in.
> 
> Again, this is not how things are working. First of all the kernel 
> absolutely should *NOT* know about those regions.
> 
> What we have inside the kernel is the information what happens if an 
> address X is accessed. On AMD HW this can be:
> 
> 1. Route to the PCIe bus because the mapped BO is stored in system memory.
> 2. Route to the internal MC because the mapped BO is stored in local 
> memory.
> 3. Route to other GPUs in the same hive.
> 4. Route to some doorbell to kick of other work.
> ...
> x. Ignore write, return 0 on reads (this is what is used for sparse 
> mappings).
> x+1. Trigger a recoverable page fault. This is used for things like SVA.
> x+2. Trigger a non-recoverable page fault. This is used for things like 
> unmapped regions where access is illegal.
> 
> All this is plus some hw specific caching flags.
> 
> When Vulkan allocates a sparse VKBuffer what should happen is the 
> following:
> 
> 1. The Vulkan driver somehow figures out a VA region A..B for the 
> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), but 
> essentially is currently driver specific.

Right, for Nouveau we have this in userspace as well.

> 
> 2. The kernel gets a request to map the VA range A..B as sparse, meaning 
> that it updates the page tables from A..B with the sparse setting.
> 
> 3. User space asks kernel to map a couple of memory backings at location 
> A+1, A+10, A+15 etc....
> 
> 4. The VKBuffer is de-allocated, userspace asks kernel to update region 
> A..B to not map anything (usually triggers a non-recoverable fault).

Until here this seems to be identical to what I'm doing.

It'd be interesting to know how amdgpu handles everything that 
potentially happens between your 3) and 4). More specifically, how are 
the page tables changed when memory backed mappings are mapped on a 
sparse range? What happens when the memory backed mappings are unmapped, 
but the VKBuffer isn't de-allocated, and hence sparse mappings need to 
be re-deployed?

Let's assume the sparse VKBuffer (and hence the VA space allocation) is 
pretty large. In Nouveau the corresponding PTEs would have a rather huge 
page size to cover this. Now, if small memory backed mappings are mapped 
to this huge sparse buffer, in Nouveau we'd allocate a new PT with a 
corresponding smaller page size overlaying the sparse mappings PTEs.

How would this look like in amdgpu?

> 
> When you want to unify this between hw drivers I strongly suggest to 
> completely start from scratch once more.
> 
> First of all don't think about those mappings as VMAs, that won't work 
> because VMAs are usually something large. Think of this as individual 
> PTEs controlled by the application. similar how COW mappings and struct 
> pages are handled inside the kernel.

Why do you consider tracking single PTEs superior to tracking VMAs? All 
the properties for a page you mentioned above should be equal for the 
entirety of pages of a whole (memory backed) mapping, aren't they?

> 
> Then I would start with the VA allocation manager. You could probably 
> base that on drm_mm. We handle it differently in amdgpu currently, but I 
> think this is something we could change.

It was not my intention to come up with an actual allocator for the VA 
space in the sense of actually finding a free and fitting hole in the VA 
space.

For Nouveau (and XE, I think) we have this in userspace and from what 
you've written previously I thought the same applies for amdgpu?

> 
> Then come up with something close to the amdgpu VM system. I'm pretty 
> sure that should work for Nouveau and Intel XA as well. In other words 
> you just have a bunch of very very small structures which represents 
> mappings and a larger structure which combine all mappings of a specific 
> type, e.g. all mappings of a BO or all sparse mappings etc...

Considering what you wrote above I assume that small structures / 
mappings in this paragraph refer to PTEs.

Immediately, I don't really see how this fine grained resolution of 
single PTEs would help implementing this in Nouveau. Actually, I think 
it would even complicate the handling of PTs, but I would need to think 
about this a bit more.

> 
> Merging of regions is actually not mandatory. We don't do it in amdgpu 
> and can live with the additional mappings pretty well. But I think this 
> can differ between drivers.
> 
> Regards,
> Christian.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 21:09                 ` Danilo Krummrich
@ 2023-01-29 18:46                   ` Danilo Krummrich
  2023-01-30 13:02                     ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-29 18:46 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc



On 1/27/23 22:09, Danilo Krummrich wrote:
> On 1/27/23 16:17, Christian König wrote:
>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>> [SNIP]
>>>>>>
>>>>>> What you want is one component for tracking the VA allocations 
>>>>>> (drm_mm based) and a different component/interface for tracking 
>>>>>> the VA mappings (probably rb tree based).
>>>>>
>>>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>>>> which correspond to VA allocations and gpuvas which represent the 
>>>>> mappings. Both are tracked separately (currently both with a 
>>>>> separate drm_mm, though). However, the GPUVA manager needs to take 
>>>>> regions into account when dealing with mappings to make sure the 
>>>>> GPUVA manager doesn't propose drivers to merge over region 
>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't merge 
>>>>> mappings from different VKBuffer objects even if they're virtually 
>>>>> and physically contiguous.
>>>>
>>>> That are two completely different things and shouldn't be handled in 
>>>> a single component.
>>>
>>> They are different things, but they're related in a way that for 
>>> handling the mappings (in particular merging and sparse) the GPUVA 
>>> manager needs to know the VA allocation (or region) boundaries.
>>>
>>> I have the feeling there might be a misunderstanding. Userspace is in 
>>> charge to actually allocate a portion of VA space and manage it. The 
>>> GPUVA manager just needs to know about those VA space allocations and 
>>> hence keeps track of them.
>>>
>>> The GPUVA manager is not meant to be an allocator in the sense of 
>>> finding and providing a hole for a given request.
>>>
>>> Maybe the non-ideal choice of using drm_mm was implying something else.
>>
>> Uff, well long story short that doesn't even remotely match the 
>> requirements. This way the GPUVA manager won't be usable for a whole 
>> bunch of use cases.
>>
>> What we have are mappings which say X needs to point to Y with this 
>> and hw dependent flags.
>>
>> The whole idea of having ranges is not going to fly. Neither with AMD 
>> GPUs and I strongly think not with Intels XA either.
> 
> A range in the sense of the GPUVA manager simply represents a VA space 
> allocation (which in case of Nouveau is taken in userspace). Userspace 
> allocates the portion of VA space and lets the kernel know about it. The 
> current implementation needs that for the named reasons. So, I think 
> there is no reason why this would work with one GPU, but not with 
> another. It's just part of the design choice of the manager.
> 
> And I'm absolutely happy to discuss the details of the manager 
> implementation though.
> 
>>
>>>> We should probably talk about the design of the GPUVA manager once 
>>>> more when this should be applicable to all GPU drivers.
>>>
>>> That's what I try to figure out with this RFC, how to make it 
>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
>>
>> Yeah, that was really good idea :) That proposal here is really far 
>> away from the actual requirements.
>>
> 
> And those are the ones I'm looking for. Do you mind sharing the 
> requirements for amdgpu in particular?
> 
>>>>> For sparse residency the kernel also needs to know the region 
>>>>> boundaries to make sure that it keeps sparse mappings around.
>>>>
>>>> What?
>>>
>>> When userspace creates a new VKBuffer with the 
>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>>> sparse mappings in order to ensure that using this buffer without any 
>>> memory backed mappings doesn't fault the GPU.
>>>
>>> Currently, the implementation does this the following way:
>>>
>>> 1. Userspace creates a new VKBuffer and hence allocates a portion of 
>>> the VA space for it. It calls into the kernel indicating the new VA 
>>> space region and the fact that the region is sparse.
>>>
>>> 2. The kernel picks up the region and stores it in the GPUVA manager, 
>>> the driver creates the corresponding sparse mappings / page table 
>>> entries.
>>>
>>> 3. Userspace might ask the driver to create a couple of memory backed 
>>> mappings for this particular VA region. The GPUVA manager stores the 
>>> mapping parameters, the driver creates the corresponding page table 
>>> entries.
>>>
>>> 4. Userspace might ask to unmap all the memory backed mappings from 
>>> this particular VA region. The GPUVA manager removes the mapping 
>>> parameters, the driver cleans up the corresponding page table 
>>> entries. However, the driver also needs to re-create the sparse 
>>> mappings, since it's a sparse buffer, hence it needs to know the 
>>> boundaries of the region it needs to create the sparse mappings in.
>>
>> Again, this is not how things are working. First of all the kernel 
>> absolutely should *NOT* know about those regions.
>>
>> What we have inside the kernel is the information what happens if an 
>> address X is accessed. On AMD HW this can be:
>>
>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>> memory.
>> 2. Route to the internal MC because the mapped BO is stored in local 
>> memory.
>> 3. Route to other GPUs in the same hive.
>> 4. Route to some doorbell to kick of other work.
>> ...
>> x. Ignore write, return 0 on reads (this is what is used for sparse 
>> mappings).
>> x+1. Trigger a recoverable page fault. This is used for things like SVA.
>> x+2. Trigger a non-recoverable page fault. This is used for things 
>> like unmapped regions where access is illegal.
>>
>> All this is plus some hw specific caching flags.
>>
>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>> following:
>>
>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), 
>> but essentially is currently driver specific.
> 
> Right, for Nouveau we have this in userspace as well.
> 
>>
>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>> meaning that it updates the page tables from A..B with the sparse 
>> setting.
>>
>> 3. User space asks kernel to map a couple of memory backings at 
>> location A+1, A+10, A+15 etc....
>>
>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>> region A..B to not map anything (usually triggers a non-recoverable 
>> fault).
> 
> Until here this seems to be identical to what I'm doing.
> 
> It'd be interesting to know how amdgpu handles everything that 
> potentially happens between your 3) and 4). More specifically, how are 
> the page tables changed when memory backed mappings are mapped on a 
> sparse range? What happens when the memory backed mappings are unmapped, 
> but the VKBuffer isn't de-allocated, and hence sparse mappings need to 
> be re-deployed?
> 
> Let's assume the sparse VKBuffer (and hence the VA space allocation) is 
> pretty large. In Nouveau the corresponding PTEs would have a rather huge 
> page size to cover this. Now, if small memory backed mappings are mapped 
> to this huge sparse buffer, in Nouveau we'd allocate a new PT with a 
> corresponding smaller page size overlaying the sparse mappings PTEs.
> 
> How would this look like in amdgpu?
> 
>>
>> When you want to unify this between hw drivers I strongly suggest to 
>> completely start from scratch once more.
>>

I just took some time digging into amdgpu and, surprisingly, aside from 
the gpuva_regions it seems like amdgpu basically does exactly the same 
as I do in the GPU VA manager. As explained, those region boundaries are 
needed for merging only and, depending on the driver, might be useful 
for sparse mappings.

For drivers that don't intend to merge at all and (somehow) are capable 
of dealing with sparse regions without knowing the sparse region's 
boundaries, it'd be easy to make those gpuva_regions optional.

>> First of all don't think about those mappings as VMAs, that won't work 
>> because VMAs are usually something large. Think of this as individual 
>> PTEs controlled by the application. similar how COW mappings and 
>> struct pages are handled inside the kernel.
> 
> Why do you consider tracking single PTEs superior to tracking VMAs? All 
> the properties for a page you mentioned above should be equal for the 
> entirety of pages of a whole (memory backed) mapping, aren't they?
> 
>>
>> Then I would start with the VA allocation manager. You could probably 
>> base that on drm_mm. We handle it differently in amdgpu currently, but 
>> I think this is something we could change.
> 
> It was not my intention to come up with an actual allocator for the VA 
> space in the sense of actually finding a free and fitting hole in the VA 
> space.
> 
> For Nouveau (and XE, I think) we have this in userspace and from what 
> you've written previously I thought the same applies for amdgpu?
> 
>>
>> Then come up with something close to the amdgpu VM system. I'm pretty 
>> sure that should work for Nouveau and Intel XA as well. In other words 
>> you just have a bunch of very very small structures which represents 
>> mappings and a larger structure which combine all mappings of a 
>> specific type, e.g. all mappings of a BO or all sparse mappings etc...
> 
> Considering what you wrote above I assume that small structures / 
> mappings in this paragraph refer to PTEs.
> 
> Immediately, I don't really see how this fine grained resolution of 
> single PTEs would help implementing this in Nouveau. Actually, I think 
> it would even complicate the handling of PTs, but I would need to think 
> about this a bit more.
> 
>>
>> Merging of regions is actually not mandatory. We don't do it in amdgpu 
>> and can live with the additional mappings pretty well. But I think 
>> this can differ between drivers.
>>
>> Regards,
>> Christian.
>>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-27 20:25                 ` David Airlie
@ 2023-01-30 12:58                   ` Christian König
  0 siblings, 0 replies; 75+ messages in thread
From: Christian König @ 2023-01-30 12:58 UTC (permalink / raw)
  To: David Airlie
  Cc: Danilo Krummrich, Matthew Brost, daniel, bskeggs, jason,
	tzimmermann, mripard, corbet, nouveau, linux-kernel, dri-devel,
	linux-doc

Am 27.01.23 um 21:25 schrieb David Airlie:
> [SNIP]
>> What we have inside the kernel is the information what happens if an
>> address X is accessed. On AMD HW this can be:
>>
>> 1. Route to the PCIe bus because the mapped BO is stored in system memory.
>> 2. Route to the internal MC because the mapped BO is stored in local memory.
>> 3. Route to other GPUs in the same hive.
>> 4. Route to some doorbell to kick of other work.
>> ...
>> x. Ignore write, return 0 on reads (this is what is used for sparse
>> mappings).
>> x+1. Trigger a recoverable page fault. This is used for things like SVA.
>> x+2. Trigger a non-recoverable page fault. This is used for things like
>> unmapped regions where access is illegal.
>>
>> All this is plus some hw specific caching flags.
>>
>> When Vulkan allocates a sparse VKBuffer what should happen is the following:
>>
>> 1. The Vulkan driver somehow figures out a VA region A..B for the
>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), but
>> essentially is currently driver specific.
> There are NO plans to have drm_mm do VA region management, VA region
> management will be in userspace in Mesa. Can we just not bring that up again?

If we are talking about Mesa drivers then yes that should work because 
they can then implement all the hw specific quirks you need for VA 
allocation. If the VA allocation should be hw independent then we have a 
major problem here.

At least on AMD hw we have four different address spaces and even if you 
know of hand from which one you want to allocate you need to share your 
address space between Vulkan, VA-API and potentially even things like 
ROCm/OpenCL.

If we don't properly do that then the AMD user space tools for debugging 
and profiling (RMV, UMR etc...) won't work any more.

> This is for GPU VA tracking not management if that makes it easier we
> could rename it.
>
>> 2. The kernel gets a request to map the VA range A..B as sparse, meaning
>> that it updates the page tables from A..B with the sparse setting.
>>
>> 3. User space asks kernel to map a couple of memory backings at location
>> A+1, A+10, A+15 etc....
> 3.5?
>
> Userspace asks the kernel to unmap A+1 so it can later map something
> else in there?
>
> What happens in that case, with a set of queued binds, do you just do
> a new sparse mapping for A+1, does userspace decide that?

Yes, exactly that. Essentially there are no unmap operation from the 
kernel pov.

You just tell the kernel what should happen when the hw tries to resolve 
address X.

This what can happen can potentially be resolve to some buffer memory, 
ignored for sparse binding or generate a fault. This is stuff which is 
most likely common to all drivers.

But then at least on newer AMD hardware we also have things like raise a 
debug trap on access, wait forever until a debugger tells you to 
continue.....

It would be great if we could have the common stuff for a VA update 
IOCTL common for all drivers, e.g. in/out fences, range description 
(start, offset, end....), GEM handle in a standardized structure while 
still be able to handle all that hw specific stuff as well.

Christian.

>
> Dave.
>
>> 4. The VKBuffer is de-allocated, userspace asks kernel to update region
>> A..B to not map anything (usually triggers a non-recoverable fault).
>>
>> When you want to unify this between hw drivers I strongly suggest to
>> completely start from scratch once more.
>>
>> First of all don't think about those mappings as VMAs, that won't work
>> because VMAs are usually something large. Think of this as individual
>> PTEs controlled by the application. similar how COW mappings and struct
>> pages are handled inside the kernel.
>>
>> Then I would start with the VA allocation manager. You could probably
>> base that on drm_mm. We handle it differently in amdgpu currently, but I
>> think this is something we could change.
>>
>> Then come up with something close to the amdgpu VM system. I'm pretty
>> sure that should work for Nouveau and Intel XA as well. In other words
>> you just have a bunch of very very small structures which represents
>> mappings and a larger structure which combine all mappings of a specific
>> type, e.g. all mappings of a BO or all sparse mappings etc...
>>
>> Merging of regions is actually not mandatory. We don't do it in amdgpu
>> and can live with the additional mappings pretty well. But I think this
>> can differ between drivers.
>>
>> Regards,
>> Christian.
>>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-29 18:46                   ` Danilo Krummrich
@ 2023-01-30 13:02                     ` Christian König
  2023-01-30 23:38                       ` Danilo Krummrich
  2023-02-01  8:10                       ` [Nouveau] " Dave Airlie
  0 siblings, 2 replies; 75+ messages in thread
From: Christian König @ 2023-01-30 13:02 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
> On 1/27/23 22:09, Danilo Krummrich wrote:
>> On 1/27/23 16:17, Christian König wrote:
>>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>>> [SNIP]
>>>>>>>
>>>>>>> What you want is one component for tracking the VA allocations 
>>>>>>> (drm_mm based) and a different component/interface for tracking 
>>>>>>> the VA mappings (probably rb tree based).
>>>>>>
>>>>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>>>>> which correspond to VA allocations and gpuvas which represent the 
>>>>>> mappings. Both are tracked separately (currently both with a 
>>>>>> separate drm_mm, though). However, the GPUVA manager needs to 
>>>>>> take regions into account when dealing with mappings to make sure 
>>>>>> the GPUVA manager doesn't propose drivers to merge over region 
>>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't 
>>>>>> merge mappings from different VKBuffer objects even if they're 
>>>>>> virtually and physically contiguous.
>>>>>
>>>>> That are two completely different things and shouldn't be handled 
>>>>> in a single component.
>>>>
>>>> They are different things, but they're related in a way that for 
>>>> handling the mappings (in particular merging and sparse) the GPUVA 
>>>> manager needs to know the VA allocation (or region) boundaries.
>>>>
>>>> I have the feeling there might be a misunderstanding. Userspace is 
>>>> in charge to actually allocate a portion of VA space and manage it. 
>>>> The GPUVA manager just needs to know about those VA space 
>>>> allocations and hence keeps track of them.
>>>>
>>>> The GPUVA manager is not meant to be an allocator in the sense of 
>>>> finding and providing a hole for a given request.
>>>>
>>>> Maybe the non-ideal choice of using drm_mm was implying something 
>>>> else.
>>>
>>> Uff, well long story short that doesn't even remotely match the 
>>> requirements. This way the GPUVA manager won't be usable for a whole 
>>> bunch of use cases.
>>>
>>> What we have are mappings which say X needs to point to Y with this 
>>> and hw dependent flags.
>>>
>>> The whole idea of having ranges is not going to fly. Neither with 
>>> AMD GPUs and I strongly think not with Intels XA either.
>>
>> A range in the sense of the GPUVA manager simply represents a VA 
>> space allocation (which in case of Nouveau is taken in userspace). 
>> Userspace allocates the portion of VA space and lets the kernel know 
>> about it. The current implementation needs that for the named 
>> reasons. So, I think there is no reason why this would work with one 
>> GPU, but not with another. It's just part of the design choice of the 
>> manager.
>>
>> And I'm absolutely happy to discuss the details of the manager 
>> implementation though.
>>
>>>
>>>>> We should probably talk about the design of the GPUVA manager once 
>>>>> more when this should be applicable to all GPU drivers.
>>>>
>>>> That's what I try to figure out with this RFC, how to make it 
>>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
>>>
>>> Yeah, that was really good idea :) That proposal here is really far 
>>> away from the actual requirements.
>>>
>>
>> And those are the ones I'm looking for. Do you mind sharing the 
>> requirements for amdgpu in particular?
>>
>>>>>> For sparse residency the kernel also needs to know the region 
>>>>>> boundaries to make sure that it keeps sparse mappings around.
>>>>>
>>>>> What?
>>>>
>>>> When userspace creates a new VKBuffer with the 
>>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>>>> sparse mappings in order to ensure that using this buffer without 
>>>> any memory backed mappings doesn't fault the GPU.
>>>>
>>>> Currently, the implementation does this the following way:
>>>>
>>>> 1. Userspace creates a new VKBuffer and hence allocates a portion 
>>>> of the VA space for it. It calls into the kernel indicating the new 
>>>> VA space region and the fact that the region is sparse.
>>>>
>>>> 2. The kernel picks up the region and stores it in the GPUVA 
>>>> manager, the driver creates the corresponding sparse mappings / 
>>>> page table entries.
>>>>
>>>> 3. Userspace might ask the driver to create a couple of memory 
>>>> backed mappings for this particular VA region. The GPUVA manager 
>>>> stores the mapping parameters, the driver creates the corresponding 
>>>> page table entries.
>>>>
>>>> 4. Userspace might ask to unmap all the memory backed mappings from 
>>>> this particular VA region. The GPUVA manager removes the mapping 
>>>> parameters, the driver cleans up the corresponding page table 
>>>> entries. However, the driver also needs to re-create the sparse 
>>>> mappings, since it's a sparse buffer, hence it needs to know the 
>>>> boundaries of the region it needs to create the sparse mappings in.
>>>
>>> Again, this is not how things are working. First of all the kernel 
>>> absolutely should *NOT* know about those regions.
>>>
>>> What we have inside the kernel is the information what happens if an 
>>> address X is accessed. On AMD HW this can be:
>>>
>>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>>> memory.
>>> 2. Route to the internal MC because the mapped BO is stored in local 
>>> memory.
>>> 3. Route to other GPUs in the same hive.
>>> 4. Route to some doorbell to kick of other work.
>>> ...
>>> x. Ignore write, return 0 on reads (this is what is used for sparse 
>>> mappings).
>>> x+1. Trigger a recoverable page fault. This is used for things like 
>>> SVA.
>>> x+2. Trigger a non-recoverable page fault. This is used for things 
>>> like unmapped regions where access is illegal.
>>>
>>> All this is plus some hw specific caching flags.
>>>
>>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>>> following:
>>>
>>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), 
>>> but essentially is currently driver specific.
>>
>> Right, for Nouveau we have this in userspace as well.
>>
>>>
>>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>>> meaning that it updates the page tables from A..B with the sparse 
>>> setting.
>>>
>>> 3. User space asks kernel to map a couple of memory backings at 
>>> location A+1, A+10, A+15 etc....
>>>
>>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>>> region A..B to not map anything (usually triggers a non-recoverable 
>>> fault).
>>
>> Until here this seems to be identical to what I'm doing.
>>
>> It'd be interesting to know how amdgpu handles everything that 
>> potentially happens between your 3) and 4). More specifically, how 
>> are the page tables changed when memory backed mappings are mapped on 
>> a sparse range? What happens when the memory backed mappings are 
>> unmapped, but the VKBuffer isn't de-allocated, and hence sparse 
>> mappings need to be re-deployed?
>>
>> Let's assume the sparse VKBuffer (and hence the VA space allocation) 
>> is pretty large. In Nouveau the corresponding PTEs would have a 
>> rather huge page size to cover this. Now, if small memory backed 
>> mappings are mapped to this huge sparse buffer, in Nouveau we'd 
>> allocate a new PT with a corresponding smaller page size overlaying 
>> the sparse mappings PTEs.
>>
>> How would this look like in amdgpu?
>>
>>>
>>> When you want to unify this between hw drivers I strongly suggest to 
>>> completely start from scratch once more.
>>>
>
> I just took some time digging into amdgpu and, surprisingly, aside 
> from the gpuva_regions it seems like amdgpu basically does exactly the 
> same as I do in the GPU VA manager. As explained, those region 
> boundaries are needed for merging only and, depending on the driver, 
> might be useful for sparse mappings.
>
> For drivers that don't intend to merge at all and (somehow) are 
> capable of dealing with sparse regions without knowing the sparse 
> region's boundaries, it'd be easy to make those gpuva_regions optional.

Yeah, but this then defeats the approach of having the same hw 
independent interface/implementation for all drivers.

Let me ask the other way around how does the hw implementation of a 
sparse mapping looks like for NVidia based hardware?

For newer AMD hw its a flag in the page tables, for older hw its a 
register where you can specify ranges A..B. We don't really support the 
later with AMDGPU any more, but from this interface I would guess you 
have the second variant, right?

Christian.

>
>>> First of all don't think about those mappings as VMAs, that won't 
>>> work because VMAs are usually something large. Think of this as 
>>> individual PTEs controlled by the application. similar how COW 
>>> mappings and struct pages are handled inside the kernel.
>>
>> Why do you consider tracking single PTEs superior to tracking VMAs? 
>> All the properties for a page you mentioned above should be equal for 
>> the entirety of pages of a whole (memory backed) mapping, aren't they?
>>
>>>
>>> Then I would start with the VA allocation manager. You could 
>>> probably base that on drm_mm. We handle it differently in amdgpu 
>>> currently, but I think this is something we could change.
>>
>> It was not my intention to come up with an actual allocator for the 
>> VA space in the sense of actually finding a free and fitting hole in 
>> the VA space.
>>
>> For Nouveau (and XE, I think) we have this in userspace and from what 
>> you've written previously I thought the same applies for amdgpu?
>>
>>>
>>> Then come up with something close to the amdgpu VM system. I'm 
>>> pretty sure that should work for Nouveau and Intel XA as well. In 
>>> other words you just have a bunch of very very small structures 
>>> which represents mappings and a larger structure which combine all 
>>> mappings of a specific type, e.g. all mappings of a BO or all sparse 
>>> mappings etc...
>>
>> Considering what you wrote above I assume that small structures / 
>> mappings in this paragraph refer to PTEs.
>>
>> Immediately, I don't really see how this fine grained resolution of 
>> single PTEs would help implementing this in Nouveau. Actually, I 
>> think it would even complicate the handling of PTs, but I would need 
>> to think about this a bit more.
>>
>>>
>>> Merging of regions is actually not mandatory. We don't do it in 
>>> amdgpu and can live with the additional mappings pretty well. But I 
>>> think this can differ between drivers.
>>>
>>> Regards,
>>> Christian.
>>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-30 13:02                     ` Christian König
@ 2023-01-30 23:38                       ` Danilo Krummrich
  2023-02-01  8:10                       ` [Nouveau] " Dave Airlie
  1 sibling, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-01-30 23:38 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

On 1/30/23 14:02, Christian König wrote:
> Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
>> On 1/27/23 22:09, Danilo Krummrich wrote:
>>> On 1/27/23 16:17, Christian König wrote:
>>>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>>>> [SNIP]
>>>>>>>>
>>>>>>>> What you want is one component for tracking the VA allocations 
>>>>>>>> (drm_mm based) and a different component/interface for tracking 
>>>>>>>> the VA mappings (probably rb tree based).
>>>>>>>
>>>>>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>>>>>> which correspond to VA allocations and gpuvas which represent the 
>>>>>>> mappings. Both are tracked separately (currently both with a 
>>>>>>> separate drm_mm, though). However, the GPUVA manager needs to 
>>>>>>> take regions into account when dealing with mappings to make sure 
>>>>>>> the GPUVA manager doesn't propose drivers to merge over region 
>>>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't 
>>>>>>> merge mappings from different VKBuffer objects even if they're 
>>>>>>> virtually and physically contiguous.
>>>>>>
>>>>>> That are two completely different things and shouldn't be handled 
>>>>>> in a single component.
>>>>>
>>>>> They are different things, but they're related in a way that for 
>>>>> handling the mappings (in particular merging and sparse) the GPUVA 
>>>>> manager needs to know the VA allocation (or region) boundaries.
>>>>>
>>>>> I have the feeling there might be a misunderstanding. Userspace is 
>>>>> in charge to actually allocate a portion of VA space and manage it. 
>>>>> The GPUVA manager just needs to know about those VA space 
>>>>> allocations and hence keeps track of them.
>>>>>
>>>>> The GPUVA manager is not meant to be an allocator in the sense of 
>>>>> finding and providing a hole for a given request.
>>>>>
>>>>> Maybe the non-ideal choice of using drm_mm was implying something 
>>>>> else.
>>>>
>>>> Uff, well long story short that doesn't even remotely match the 
>>>> requirements. This way the GPUVA manager won't be usable for a whole 
>>>> bunch of use cases.
>>>>
>>>> What we have are mappings which say X needs to point to Y with this 
>>>> and hw dependent flags.
>>>>
>>>> The whole idea of having ranges is not going to fly. Neither with 
>>>> AMD GPUs and I strongly think not with Intels XA either.
>>>
>>> A range in the sense of the GPUVA manager simply represents a VA 
>>> space allocation (which in case of Nouveau is taken in userspace). 
>>> Userspace allocates the portion of VA space and lets the kernel know 
>>> about it. The current implementation needs that for the named 
>>> reasons. So, I think there is no reason why this would work with one 
>>> GPU, but not with another. It's just part of the design choice of the 
>>> manager.
>>>
>>> And I'm absolutely happy to discuss the details of the manager 
>>> implementation though.
>>>
>>>>
>>>>>> We should probably talk about the design of the GPUVA manager once 
>>>>>> more when this should be applicable to all GPU drivers.
>>>>>
>>>>> That's what I try to figure out with this RFC, how to make it 
>>>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
>>>>
>>>> Yeah, that was really good idea :) That proposal here is really far 
>>>> away from the actual requirements.
>>>>
>>>
>>> And those are the ones I'm looking for. Do you mind sharing the 
>>> requirements for amdgpu in particular?
>>>
>>>>>>> For sparse residency the kernel also needs to know the region 
>>>>>>> boundaries to make sure that it keeps sparse mappings around.
>>>>>>
>>>>>> What?
>>>>>
>>>>> When userspace creates a new VKBuffer with the 
>>>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>>>>> sparse mappings in order to ensure that using this buffer without 
>>>>> any memory backed mappings doesn't fault the GPU.
>>>>>
>>>>> Currently, the implementation does this the following way:
>>>>>
>>>>> 1. Userspace creates a new VKBuffer and hence allocates a portion 
>>>>> of the VA space for it. It calls into the kernel indicating the new 
>>>>> VA space region and the fact that the region is sparse.
>>>>>
>>>>> 2. The kernel picks up the region and stores it in the GPUVA 
>>>>> manager, the driver creates the corresponding sparse mappings / 
>>>>> page table entries.
>>>>>
>>>>> 3. Userspace might ask the driver to create a couple of memory 
>>>>> backed mappings for this particular VA region. The GPUVA manager 
>>>>> stores the mapping parameters, the driver creates the corresponding 
>>>>> page table entries.
>>>>>
>>>>> 4. Userspace might ask to unmap all the memory backed mappings from 
>>>>> this particular VA region. The GPUVA manager removes the mapping 
>>>>> parameters, the driver cleans up the corresponding page table 
>>>>> entries. However, the driver also needs to re-create the sparse 
>>>>> mappings, since it's a sparse buffer, hence it needs to know the 
>>>>> boundaries of the region it needs to create the sparse mappings in.
>>>>
>>>> Again, this is not how things are working. First of all the kernel 
>>>> absolutely should *NOT* know about those regions.
>>>>
>>>> What we have inside the kernel is the information what happens if an 
>>>> address X is accessed. On AMD HW this can be:
>>>>
>>>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>>>> memory.
>>>> 2. Route to the internal MC because the mapped BO is stored in local 
>>>> memory.
>>>> 3. Route to other GPUs in the same hive.
>>>> 4. Route to some doorbell to kick of other work.
>>>> ...
>>>> x. Ignore write, return 0 on reads (this is what is used for sparse 
>>>> mappings).
>>>> x+1. Trigger a recoverable page fault. This is used for things like 
>>>> SVA.
>>>> x+2. Trigger a non-recoverable page fault. This is used for things 
>>>> like unmapped regions where access is illegal.
>>>>
>>>> All this is plus some hw specific caching flags.
>>>>
>>>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>>>> following:
>>>>
>>>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>>>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), 
>>>> but essentially is currently driver specific.
>>>
>>> Right, for Nouveau we have this in userspace as well.
>>>
>>>>
>>>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>>>> meaning that it updates the page tables from A..B with the sparse 
>>>> setting.
>>>>
>>>> 3. User space asks kernel to map a couple of memory backings at 
>>>> location A+1, A+10, A+15 etc....
>>>>
>>>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>>>> region A..B to not map anything (usually triggers a non-recoverable 
>>>> fault).
>>>
>>> Until here this seems to be identical to what I'm doing.
>>>
>>> It'd be interesting to know how amdgpu handles everything that 
>>> potentially happens between your 3) and 4). More specifically, how 
>>> are the page tables changed when memory backed mappings are mapped on 
>>> a sparse range? What happens when the memory backed mappings are 
>>> unmapped, but the VKBuffer isn't de-allocated, and hence sparse 
>>> mappings need to be re-deployed?
>>>
>>> Let's assume the sparse VKBuffer (and hence the VA space allocation) 
>>> is pretty large. In Nouveau the corresponding PTEs would have a 
>>> rather huge page size to cover this. Now, if small memory backed 
>>> mappings are mapped to this huge sparse buffer, in Nouveau we'd 
>>> allocate a new PT with a corresponding smaller page size overlaying 
>>> the sparse mappings PTEs.
>>>
>>> How would this look like in amdgpu?
>>>
>>>>
>>>> When you want to unify this between hw drivers I strongly suggest to 
>>>> completely start from scratch once more.
>>>>
>>
>> I just took some time digging into amdgpu and, surprisingly, aside 
>> from the gpuva_regions it seems like amdgpu basically does exactly the 
>> same as I do in the GPU VA manager. As explained, those region 
>> boundaries are needed for merging only and, depending on the driver, 
>> might be useful for sparse mappings.
>>
>> For drivers that don't intend to merge at all and (somehow) are 
>> capable of dealing with sparse regions without knowing the sparse 
>> region's boundaries, it'd be easy to make those gpuva_regions optional.
> 
> Yeah, but this then defeats the approach of having the same hw 
> independent interface/implementation for all drivers.

That's probably a question of interpretation and I'd rather see it as an 
optional feature. Probably 80% to 90% of the code is for tracking 
mappings, generating split / merge steps on bind / unbind and connect 
the mappings to GEM objects. This would be the same for all the drivers 
and some might opt-in for using the feature of additionally tracking 
regions on top and other won't.

> 
> Let me ask the other way around how does the hw implementation of a 
> sparse mapping looks like for NVidia based hardware?
> 
> For newer AMD hw its a flag in the page tables, for older hw its a 
> register where you can specify ranges A..B. We don't really support the 
> later with AMDGPU any more, but from this interface I would guess you 
> have the second variant, right?

No, it's a flag in the PTEs as well.

However, for a rather huge sparse region the sparse PTEs might have a 
different (larger) page size than the PTEs of the (smaller) memory 
backed mappings. Hence, it might be that when trying to map a small 
memory backed mapping within a huge sparse region, we can *not* just 
change the sparse PTE to point to actual memory, but rather create a new 
PT with a smaller page size kind of overlaying the page table containing 
the sparse PTEs with a greater page size.

In such a situation, tracking the whole (sparse) region (representing 
the whole VA allocation) separately comes in handy.

And of course, as mentioned, tracking regions gives us the bounds for 
merging, which e.g. might be useful to pick a greater page size for 
merged mappings. It might also keep the amount of mappings to track down 
by a little bit, however, this would probably only be relevant if we 
actually have quite a few to merge.

> 
> Christian.
> 
>>
>>>> First of all don't think about those mappings as VMAs, that won't 
>>>> work because VMAs are usually something large. Think of this as 
>>>> individual PTEs controlled by the application. similar how COW 
>>>> mappings and struct pages are handled inside the kernel.
>>>
>>> Why do you consider tracking single PTEs superior to tracking VMAs? 
>>> All the properties for a page you mentioned above should be equal for 
>>> the entirety of pages of a whole (memory backed) mapping, aren't they?
>>>
>>>>
>>>> Then I would start with the VA allocation manager. You could 
>>>> probably base that on drm_mm. We handle it differently in amdgpu 
>>>> currently, but I think this is something we could change.
>>>
>>> It was not my intention to come up with an actual allocator for the 
>>> VA space in the sense of actually finding a free and fitting hole in 
>>> the VA space.
>>>
>>> For Nouveau (and XE, I think) we have this in userspace and from what 
>>> you've written previously I thought the same applies for amdgpu?
>>>
>>>>
>>>> Then come up with something close to the amdgpu VM system. I'm 
>>>> pretty sure that should work for Nouveau and Intel XA as well. In 
>>>> other words you just have a bunch of very very small structures 
>>>> which represents mappings and a larger structure which combine all 
>>>> mappings of a specific type, e.g. all mappings of a BO or all sparse 
>>>> mappings etc...
>>>
>>> Considering what you wrote above I assume that small structures / 
>>> mappings in this paragraph refer to PTEs.
>>>
>>> Immediately, I don't really see how this fine grained resolution of 
>>> single PTEs would help implementing this in Nouveau. Actually, I 
>>> think it would even complicate the handling of PTs, but I would need 
>>> to think about this a bit more.
>>>
>>>>
>>>> Merging of regions is actually not mandatory. We don't do it in 
>>>> amdgpu and can live with the additional mappings pretty well. But I 
>>>> think this can differ between drivers.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-01-30 13:02                     ` Christian König
  2023-01-30 23:38                       ` Danilo Krummrich
@ 2023-02-01  8:10                       ` Dave Airlie
  2023-02-02 11:53                         ` Christian König
  1 sibling, 1 reply; 75+ messages in thread
From: Dave Airlie @ 2023-02-01  8:10 UTC (permalink / raw)
  To: Christian König
  Cc: Danilo Krummrich, Matthew Brost, daniel, corbet, dri-devel,
	linux-doc, linux-kernel, mripard, bskeggs, jason, nouveau,
	airlied

On Mon, 30 Jan 2023 at 23:02, Christian König <christian.koenig@amd.com> wrote:
>
> Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
> > On 1/27/23 22:09, Danilo Krummrich wrote:
> >> On 1/27/23 16:17, Christian König wrote:
> >>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
> >>>> [SNIP]
> >>>>>>>
> >>>>>>> What you want is one component for tracking the VA allocations
> >>>>>>> (drm_mm based) and a different component/interface for tracking
> >>>>>>> the VA mappings (probably rb tree based).
> >>>>>>
> >>>>>> That's what the GPUVA manager is doing. There are gpuva_regions
> >>>>>> which correspond to VA allocations and gpuvas which represent the
> >>>>>> mappings. Both are tracked separately (currently both with a
> >>>>>> separate drm_mm, though). However, the GPUVA manager needs to
> >>>>>> take regions into account when dealing with mappings to make sure
> >>>>>> the GPUVA manager doesn't propose drivers to merge over region
> >>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't
> >>>>>> merge mappings from different VKBuffer objects even if they're
> >>>>>> virtually and physically contiguous.
> >>>>>
> >>>>> That are two completely different things and shouldn't be handled
> >>>>> in a single component.
> >>>>
> >>>> They are different things, but they're related in a way that for
> >>>> handling the mappings (in particular merging and sparse) the GPUVA
> >>>> manager needs to know the VA allocation (or region) boundaries.
> >>>>
> >>>> I have the feeling there might be a misunderstanding. Userspace is
> >>>> in charge to actually allocate a portion of VA space and manage it.
> >>>> The GPUVA manager just needs to know about those VA space
> >>>> allocations and hence keeps track of them.
> >>>>
> >>>> The GPUVA manager is not meant to be an allocator in the sense of
> >>>> finding and providing a hole for a given request.
> >>>>
> >>>> Maybe the non-ideal choice of using drm_mm was implying something
> >>>> else.
> >>>
> >>> Uff, well long story short that doesn't even remotely match the
> >>> requirements. This way the GPUVA manager won't be usable for a whole
> >>> bunch of use cases.
> >>>
> >>> What we have are mappings which say X needs to point to Y with this
> >>> and hw dependent flags.
> >>>
> >>> The whole idea of having ranges is not going to fly. Neither with
> >>> AMD GPUs and I strongly think not with Intels XA either.
> >>
> >> A range in the sense of the GPUVA manager simply represents a VA
> >> space allocation (which in case of Nouveau is taken in userspace).
> >> Userspace allocates the portion of VA space and lets the kernel know
> >> about it. The current implementation needs that for the named
> >> reasons. So, I think there is no reason why this would work with one
> >> GPU, but not with another. It's just part of the design choice of the
> >> manager.
> >>
> >> And I'm absolutely happy to discuss the details of the manager
> >> implementation though.
> >>
> >>>
> >>>>> We should probably talk about the design of the GPUVA manager once
> >>>>> more when this should be applicable to all GPU drivers.
> >>>>
> >>>> That's what I try to figure out with this RFC, how to make it
> >>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
> >>>
> >>> Yeah, that was really good idea :) That proposal here is really far
> >>> away from the actual requirements.
> >>>
> >>
> >> And those are the ones I'm looking for. Do you mind sharing the
> >> requirements for amdgpu in particular?
> >>
> >>>>>> For sparse residency the kernel also needs to know the region
> >>>>>> boundaries to make sure that it keeps sparse mappings around.
> >>>>>
> >>>>> What?
> >>>>
> >>>> When userspace creates a new VKBuffer with the
> >>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create
> >>>> sparse mappings in order to ensure that using this buffer without
> >>>> any memory backed mappings doesn't fault the GPU.
> >>>>
> >>>> Currently, the implementation does this the following way:
> >>>>
> >>>> 1. Userspace creates a new VKBuffer and hence allocates a portion
> >>>> of the VA space for it. It calls into the kernel indicating the new
> >>>> VA space region and the fact that the region is sparse.
> >>>>
> >>>> 2. The kernel picks up the region and stores it in the GPUVA
> >>>> manager, the driver creates the corresponding sparse mappings /
> >>>> page table entries.
> >>>>
> >>>> 3. Userspace might ask the driver to create a couple of memory
> >>>> backed mappings for this particular VA region. The GPUVA manager
> >>>> stores the mapping parameters, the driver creates the corresponding
> >>>> page table entries.
> >>>>
> >>>> 4. Userspace might ask to unmap all the memory backed mappings from
> >>>> this particular VA region. The GPUVA manager removes the mapping
> >>>> parameters, the driver cleans up the corresponding page table
> >>>> entries. However, the driver also needs to re-create the sparse
> >>>> mappings, since it's a sparse buffer, hence it needs to know the
> >>>> boundaries of the region it needs to create the sparse mappings in.
> >>>
> >>> Again, this is not how things are working. First of all the kernel
> >>> absolutely should *NOT* know about those regions.
> >>>
> >>> What we have inside the kernel is the information what happens if an
> >>> address X is accessed. On AMD HW this can be:
> >>>
> >>> 1. Route to the PCIe bus because the mapped BO is stored in system
> >>> memory.
> >>> 2. Route to the internal MC because the mapped BO is stored in local
> >>> memory.
> >>> 3. Route to other GPUs in the same hive.
> >>> 4. Route to some doorbell to kick of other work.
> >>> ...
> >>> x. Ignore write, return 0 on reads (this is what is used for sparse
> >>> mappings).
> >>> x+1. Trigger a recoverable page fault. This is used for things like
> >>> SVA.
> >>> x+2. Trigger a non-recoverable page fault. This is used for things
> >>> like unmapped regions where access is illegal.
> >>>
> >>> All this is plus some hw specific caching flags.
> >>>
> >>> When Vulkan allocates a sparse VKBuffer what should happen is the
> >>> following:
> >>>
> >>> 1. The Vulkan driver somehow figures out a VA region A..B for the
> >>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm),
> >>> but essentially is currently driver specific.
> >>
> >> Right, for Nouveau we have this in userspace as well.
> >>
> >>>
> >>> 2. The kernel gets a request to map the VA range A..B as sparse,
> >>> meaning that it updates the page tables from A..B with the sparse
> >>> setting.
> >>>
> >>> 3. User space asks kernel to map a couple of memory backings at
> >>> location A+1, A+10, A+15 etc....
> >>>
> >>> 4. The VKBuffer is de-allocated, userspace asks kernel to update
> >>> region A..B to not map anything (usually triggers a non-recoverable
> >>> fault).
> >>
> >> Until here this seems to be identical to what I'm doing.
> >>
> >> It'd be interesting to know how amdgpu handles everything that
> >> potentially happens between your 3) and 4). More specifically, how
> >> are the page tables changed when memory backed mappings are mapped on
> >> a sparse range? What happens when the memory backed mappings are
> >> unmapped, but the VKBuffer isn't de-allocated, and hence sparse
> >> mappings need to be re-deployed?
> >>
> >> Let's assume the sparse VKBuffer (and hence the VA space allocation)
> >> is pretty large. In Nouveau the corresponding PTEs would have a
> >> rather huge page size to cover this. Now, if small memory backed
> >> mappings are mapped to this huge sparse buffer, in Nouveau we'd
> >> allocate a new PT with a corresponding smaller page size overlaying
> >> the sparse mappings PTEs.
> >>
> >> How would this look like in amdgpu?
> >>
> >>>
> >>> When you want to unify this between hw drivers I strongly suggest to
> >>> completely start from scratch once more.
> >>>
> >
> > I just took some time digging into amdgpu and, surprisingly, aside
> > from the gpuva_regions it seems like amdgpu basically does exactly the
> > same as I do in the GPU VA manager. As explained, those region
> > boundaries are needed for merging only and, depending on the driver,
> > might be useful for sparse mappings.
> >
> > For drivers that don't intend to merge at all and (somehow) are
> > capable of dealing with sparse regions without knowing the sparse
> > region's boundaries, it'd be easy to make those gpuva_regions optional.
>
> Yeah, but this then defeats the approach of having the same hw
> independent interface/implementation for all drivers.

I think you are running a few steps ahead here. The plan isn't to have
an independent interface, it's to provide a set of routines and
tracking that will be consistent across drivers, so that all drivers
once using them will operate in mostly the same fashion with respect
to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
amdgpu and freedreno which I think end up operating slightly different
around lifetimes. I'd like to save future driver writers the effort of
dealing with those decisions and this should drive their user api
design so to enable vulkan sparse bindings.

Now if merging is a feature that makes sense to one driver maybe it
makes sense to all, however there may be reasons amdgpu gets away
without merging that other drivers might not benefit from, there might
also be a benefit to amdgpu from merging that you haven't looked at
yet, so I think we could leave merging as an optional extra driver
knob here. The userspace API should operate the same, it would just be
the gpu pagetables that would end up different sizes.

Dave.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-01  8:10                       ` [Nouveau] " Dave Airlie
@ 2023-02-02 11:53                         ` Christian König
  2023-02-02 18:31                           ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-02 11:53 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Danilo Krummrich, Matthew Brost, daniel, corbet, dri-devel,
	linux-doc, linux-kernel, mripard, bskeggs, jason, nouveau,
	airlied

Am 01.02.23 um 09:10 schrieb Dave Airlie:
> [SNIP]
>>> For drivers that don't intend to merge at all and (somehow) are
>>> capable of dealing with sparse regions without knowing the sparse
>>> region's boundaries, it'd be easy to make those gpuva_regions optional.
>> Yeah, but this then defeats the approach of having the same hw
>> independent interface/implementation for all drivers.
> I think you are running a few steps ahead here. The plan isn't to have
> an independent interface, it's to provide a set of routines and
> tracking that will be consistent across drivers, so that all drivers
> once using them will operate in mostly the same fashion with respect
> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
> amdgpu and freedreno which I think end up operating slightly different
> around lifetimes. I'd like to save future driver writers the effort of
> dealing with those decisions and this should drive their user api
> design so to enable vulkan sparse bindings.

Ok in this case I'm pretty sure this is *NOT* a good idea.

See this means that we define the UAPI implicitly by saying to drivers 
to use a common framework for their VM implementation which then results 
in behavior A,B,C,D....

If a driver strides away from this common framework because it has 
different requirements based on how his hw work you certainly get 
different behavior again (and you have tons of hw specific requirements 
in here).

What we should do instead if we want to have some common handling among 
drivers (which I totally agree on makes sense) then we should define the 
UAPI explicitly.

For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
independent as well as driver dependent information and then has the 
documented behavior:
a) VAs do (or don't) vanish automatically when the GEM handle is closed.
b) GEM BOs do (or don't) get an additional reference for each VM they 
are used in.
c) Can handle some common use cases driver independent (BO mappings, 
readonly, writeonly, sparse etc...).
d) Has a well defined behavior when the operation is executed async. 
E.g. in/out fences.
e) Can still handle hw specific stuff like (for example) trap on access 
etc....
...

Especially d is what Bas and I have pretty much already created a 
prototype for the amdgpu specific IOCTL for, but essentially this is 
completely driver independent and actually the more complex stuff. 
Compared to that common lifetime of BOs is just nice to have.

I strongly think we should concentrate on getting this right as well.

> Now if merging is a feature that makes sense to one driver maybe it
> makes sense to all, however there may be reasons amdgpu gets away
> without merging that other drivers might not benefit from, there might
> also be a benefit to amdgpu from merging that you haven't looked at
> yet, so I think we could leave merging as an optional extra driver
> knob here. The userspace API should operate the same, it would just be
> the gpu pagetables that would end up different sizes.

Yeah, agree completely. The point is that we should not have complexity 
inside the kernel which is not necessarily needed in the kernel.

So merging or not is something we have gone back and forth for amdgpu, 
one the one hand it reduces the memory footprint of the housekeeping 
overhead on the other hand it makes the handling more complex, error 
prone and use a few more CPU cycles.

For amdgpu merging is mostly beneficial when you can get rid of a whole 
page tables layer in the hierarchy, but for this you need to merge at 
least 2MiB or 1GiB together. And since that case doesn't happen that 
often we stopped doing it.

But for my understanding why you need the ranges for the merging? Isn't 
it sufficient to check that the mappings have the same type, flags, BO, 
whatever backing them?

Regards,
Christian.


>
> Dave.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-02 11:53                         ` Christian König
@ 2023-02-02 18:31                           ` Danilo Krummrich
  2023-02-06  9:48                             ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-02 18:31 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

On 2/2/23 12:53, Christian König wrote:
> Am 01.02.23 um 09:10 schrieb Dave Airlie:
>> [SNIP]
>>>> For drivers that don't intend to merge at all and (somehow) are
>>>> capable of dealing with sparse regions without knowing the sparse
>>>> region's boundaries, it'd be easy to make those gpuva_regions optional.
>>> Yeah, but this then defeats the approach of having the same hw
>>> independent interface/implementation for all drivers.
>> I think you are running a few steps ahead here. The plan isn't to have
>> an independent interface, it's to provide a set of routines and
>> tracking that will be consistent across drivers, so that all drivers
>> once using them will operate in mostly the same fashion with respect
>> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
>> amdgpu and freedreno which I think end up operating slightly different
>> around lifetimes. I'd like to save future driver writers the effort of
>> dealing with those decisions and this should drive their user api
>> design so to enable vulkan sparse bindings.
> 
> Ok in this case I'm pretty sure this is *NOT* a good idea.
> 
> See this means that we define the UAPI implicitly by saying to drivers 
> to use a common framework for their VM implementation which then results 
> in behavior A,B,C,D....
> 
> If a driver strides away from this common framework because it has 
> different requirements based on how his hw work you certainly get 
> different behavior again (and you have tons of hw specific requirements 
> in here).
> 
> What we should do instead if we want to have some common handling among 
> drivers (which I totally agree on makes sense) then we should define the 
> UAPI explicitly.

By asking that I don't want to say I'm against this idea, I'm just 
wondering how it becomes easier to deal with "tons of hw specific 
requirements" by generalizing things even more?

What makes us think that we do a better job in considering all hw 
specific requirements with a unified UAPI than with a more lightweight 
generic component for tracking VA mappings?

Also, wouldn't we need something like the GPUVA manager as part of a 
unified UAPI?

> 
> For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
> independent as well as driver dependent information and then has the 
> documented behavior:
> a) VAs do (or don't) vanish automatically when the GEM handle is closed.
> b) GEM BOs do (or don't) get an additional reference for each VM they 
> are used in.
> c) Can handle some common use cases driver independent (BO mappings, 
> readonly, writeonly, sparse etc...).
> d) Has a well defined behavior when the operation is executed async. 
> E.g. in/out fences.
> e) Can still handle hw specific stuff like (for example) trap on access 
> etc....
> ...
> 
> Especially d is what Bas and I have pretty much already created a 
> prototype for the amdgpu specific IOCTL for, but essentially this is 
> completely driver independent and actually the more complex stuff. 
> Compared to that common lifetime of BOs is just nice to have.
> 
> I strongly think we should concentrate on getting this right as well.
> 
>> Now if merging is a feature that makes sense to one driver maybe it
>> makes sense to all, however there may be reasons amdgpu gets away
>> without merging that other drivers might not benefit from, there might
>> also be a benefit to amdgpu from merging that you haven't looked at
>> yet, so I think we could leave merging as an optional extra driver
>> knob here. The userspace API should operate the same, it would just be
>> the gpu pagetables that would end up different sizes.
> 
> Yeah, agree completely. The point is that we should not have complexity 
> inside the kernel which is not necessarily needed in the kernel.
> 
> So merging or not is something we have gone back and forth for amdgpu, 
> one the one hand it reduces the memory footprint of the housekeeping 
> overhead on the other hand it makes the handling more complex, error 
> prone and use a few more CPU cycles.
> 
> For amdgpu merging is mostly beneficial when you can get rid of a whole 
> page tables layer in the hierarchy, but for this you need to merge at 
> least 2MiB or 1GiB together. And since that case doesn't happen that 
> often we stopped doing it.
> 
> But for my understanding why you need the ranges for the merging? Isn't 
> it sufficient to check that the mappings have the same type, flags, BO, 
> whatever backing them?

Not entirely. Let's assume userspace creates two virtually contiguous 
buffers (VKBuffer) A and B. Userspace could bind a BO with BO offset 0 
to A (binding 1) and afterwards bind the same BO with BO offset 
length(A) to B (binding 2), maybe unlikely but AFAIK not illegal.

If we don't know about the bounds of A and B in the kernel, we detect 
that both bindings are virtually and physically contiguous and we merge 
them.

In the best case this was simply useless, because we'll need to split 
them anyway later on when A or B is destroyed, but in the worst case we 
could fault the GPU, e.g. if merging leads to a change of the page 
tables that are backing binding 1, but buffer A is already in use by 
userspace.

In Nouveau, I think we could also get rid of regions and do something 
driver specific for the handling of the dual page tables, which I want 
to use for sparse regions *and* just don't merge (at least for now). But 
exactly for the sake of not limiting drivers in their HW specifics I 
thought it'd be great if merging is supported in case it makes sense for 
a specific HW, especially given the fact that memory sizes are increasing.

> 
> Regards,
> Christian.
> 
> 
>>
>> Dave.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
                     ` (3 preceding siblings ...)
  2023-01-27  0:24   ` Matthew Brost
@ 2023-02-03 17:37   ` Matthew Brost
  2023-02-06 13:35     ` Christian König
  2023-02-14 11:52     ` Danilo Krummrich
  4 siblings, 2 replies; 75+ messages in thread
From: Matthew Brost @ 2023-02-03 17:37 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
> This adds the infrastructure for a manager implementation to keep track
> of GPU virtual address (VA) mappings.
> 
> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
> start implementing, allow userspace applications to request multiple and
> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
> intended to serve the following purposes in this context.
> 
> 1) Provide a dedicated range allocator to track GPU VA allocations and
>    mappings, making use of the drm_mm range allocator.
> 
> 2) Generically connect GPU VA mappings to their backing buffers, in
>    particular DRM GEM objects.
> 
> 3) Provide a common implementation to perform more complex mapping
>    operations on the GPU VA space. In particular splitting and merging
>    of GPU VA mappings, e.g. for intersecting mapping requests or partial
>    unmap requests.
> 

Over the past week I've hacked together a PoC port of Xe to GPUVA [1], so
far it seems really promising. 95% of the way to being feature
equivalent of the current Xe VM bind implementation and have line of
sight to getting sparse bindings implemented on top of GPUVA too. IMO
this has basically everything we need for Xe with a few tweaks.

I am out until 2/14 but wanted to get my thoughts / suggestions out on
the list before I leave.

1. GPUVA post didn't support the way Xe does userptrs - a NULL GEM. I
believe with [2], [3], and [4] GPUVA will support NULL GEMs. Also my
thinking sparse binds will also have NULL GEMs, more on sparse bindings
below.

2. I agree with Christian that drm_mm probably isn't what we want to
base the GPUVA implementation on, rather a RB tree or Maple tree has
been discussed. The implementation should be fairly easy to tune once we
have benchmarks running so not to concerned here as we can figure this
out down the line.

3. In Xe we want create xe_vm_op list which inherits from drm_gpuva_op
I've done this with a hack [5], I believe when we rebase we can do this
with a custom callback to allocate a large op size.

4. I'd like add user bits to drm_gpuva_flags like I do in [6]. This is
similar to DMA_FENCE_FLAG_USER_BITS.

5. In Xe we have VM prefetch operation which is needed for our compute
UMD with page faults. I'd like add prefetch type of operation like we do
in [7].

6. In Xe we have VM unbind all mappings for a GEM IOCTL, I'd like to add
support to generate this operation list to GPUVA like we do in [8].

7. I've thought about how Xe will implement sparse mappings (read 0,
writes dropped). My current thinking is a sparse mapping will be
represented as a drm_gpuva rather than region like in Nouveau. Making
regions optional to me seems likes good idea rather than forcing the
user of GPUVA code to create 1 large region for the manager as I
currently do in the Xe PoC.

8. Personally I'd like the caller to own the locking for GEM drm_gpuva
list (drm_gpuva_link_*, drm_gpuva_unlink_* functions). In Xe we almost
certainly will have the GEM dma-resv lock when we touch this list so an
extra lock here is redundant. Also it kinda goofy that caller owns the
for drm_gpuva insertion / removal but not the locking for this list.

WRT to Christian thoughts on a common uAPI rules for VM binds, I kinda
like that idea but I don't think that is necessary. All of pur uAPI
should be close but also the GPUVA implementation should be flexible
enough to fit all of our needs and I think for the most part it is.

Let me know what everything thinks about this. It would be great if when
I'm back on 2/14 I can rebase the Xe port to GPUVA on another version of
the GPUVA code and get sparse binding support implementation. Also I'd
like to get GPUVA merged in the Xe repo ASAP as our VM bind code badly
needed to be cleaned and this was the push we needed to make this
happen.

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314
[2] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=2ae21d7a3f52e5eb2c105ed8ae231471274bdc36
[3] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=49fca9f5d96201f5cbd1b19c7ff17eedfac65cdc
[4] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=61fa6b1e1f10e791ae82358fa971b04421d53024
[5] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=87fc08dcf0840e794b38269fe4c6a95d088d79ec
[6] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=a4826c22f6788bc29906ffa263c1cd3c4661fa77
[7] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=f008bbb55b213868e52c7b9cda4c1bfb95af6aee
[8] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=41f4f71c05d04d2b17d988dd95369b5df2d7f681

> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  Documentation/gpu/drm-mm.rst    |   31 +
>  drivers/gpu/drm/Makefile        |    1 +
>  drivers/gpu/drm/drm_gem.c       |    3 +
>  drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
>  include/drm/drm_drv.h           |    6 +
>  include/drm/drm_gem.h           |   75 ++
>  include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
>  7 files changed, 1966 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>  create mode 100644 include/drm/drm_gpuva_mgr.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a52e6f4117d6..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>     :export:
>  
> +DRM GPU VA Manager
> +==================
> +
> +Overview
> +--------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Overview
> +
> +Split and Merge
> +---------------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Split and Merge
> +
> +Locking
> +-------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :doc: Locking
> +
> +
> +DRM GPU VA Manager Function References
> +--------------------------------------
> +
> +.. kernel-doc:: include/drm/drm_gpuva_mgr.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
> +   :export:
> +
>  DRM Buddy Allocator
>  ===================
>  
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 4fe190aee584..de2ffca3b6e4 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -45,6 +45,7 @@ drm-y := \
>  	drm_vblank.o \
>  	drm_vblank_work.o \
>  	drm_vma_manager.o \
> +	drm_gpuva_mgr.o \
>  	drm_writeback.o
>  drm-$(CONFIG_DRM_LEGACY) += \
>  	drm_agpsupport.o \
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 59a0bb5ebd85..65115fe88627 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>  	if (!obj->resv)
>  		obj->resv = &obj->_resv;
>  
> +	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
> +		drm_gem_gpuva_init(obj);
> +
>  	drm_vma_node_reset(&obj->vma_node);
>  	INIT_LIST_HEAD(&obj->lru_node);
>  }
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> new file mode 100644
> index 000000000000..e665f642689d
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -0,0 +1,1323 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Red Hat.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors:
> + *     Danilo Krummrich <dakr@redhat.com>
> + *
> + */
> +
> +#include <drm/drm_gem.h>
> +#include <drm/drm_gpuva_mgr.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
> + * of a GPU's virtual address (VA) space and manages the corresponding virtual
> + * mappings represented by &drm_gpuva objects. It also keeps track of the
> + * mapping's backing &drm_gem_object buffers.
> + *
> + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
> + * &drm_gpuva objects representing all existent GPU VA mappings using this
> + * &drm_gem_object as backing buffer.
> + *
> + * A GPU VA mapping can only be created within a previously allocated
> + * &drm_gpuva_region, which represents a reserved portion of the GPU VA space.
> + * GPU VA mappings are not allowed to span over a &drm_gpuva_region's boundary.
> + *
> + * GPU VA regions can also be flagged as sparse, which allows drivers to create
> + * sparse mappings for a whole GPU VA region in order to support Vulkan
> + * 'Sparse Resources'.
> + *
> + * The GPU VA manager internally uses the &drm_mm range allocator to manage the
> + * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
> + * space.
> + *
> + * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
> + * the &drm_gpuva_manager contains a special region representing the portion of
> + * VA space reserved by the kernel. This node is initialized together with the
> + * GPU VA manager instance and removed when the GPU VA manager is destroyed.
> + *
> + * In a typical application drivers would embed struct drm_gpuva_manager,
> + * struct drm_gpuva_region and struct drm_gpuva within their own driver
> + * specific structures, there won't be any memory allocations of it's own nor
> + * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
> + */
> +
> +/**
> + * DOC: Split and Merge
> + *
> + * The DRM GPU VA manager also provides an algorithm implementing splitting and
> + * merging of existent GPU VA mappings with the ones that are requested to be
> + * mapped or unmapped. This feature is required by the Vulkan API to implement
> + * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
> + * VM BIND.
> + *
> + * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list of map, unmap
> + * and remap operations for a given newly requested mapping. This list
> + * represents the set of operations to execute in order to integrate the new
> + * mapping cleanly into the current state of the GPU VA space.
> + *
> + * Depending on how the new GPU VA mapping intersects with the existent mappings
> + * of the GPU VA space the &drm_gpuva_ops contain an arbitrary amount of unmap
> + * operations, a maximum of two remap operations and a single map operation.
> + * The set of operations can also be empty if no operation is required, e.g. if
> + * the requested mapping already exists in the exact same way.
> + *
> + * The single map operation, if existent, represents the original map operation
> + * requested by the caller. Please note that this operation might be altered
> + * comparing it with the original map operation, e.g. because it was merged with
> + * an already  existent mapping. Hence, drivers must execute this map operation
> + * instead of the original one they passed to drm_gpuva_sm_map_ops_create().
> + *
> + * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
> + * &drm_gpuva to unmap is physically contiguous with the original mapping
> + * request. Optionally, if 'keep' is set, drivers may keep the actual page table
> + * entries for this &drm_gpuva, adding the missing page table entries only and
> + * update the &drm_gpuva_manager's view of things accordingly.
> + *
> + * Drivers may do the same optimization, namely delta page table updates, also
> + * for remap operations. This is possible since &drm_gpuva_op_remap consists of
> + * one unmap operation and one or two map operations, such that drivers can
> + * derive the page table update delta accordingly.
> + *
> + * Note that there can't be more than two existent mappings to split up, one at
> + * the beginning and one at the end of the new mapping, hence there is a
> + * maximum of two remap operations.
> + *
> + * Generally, the DRM GPU VA manager never merges mappings across the
> + * boundaries of &drm_gpuva_regions. This is the case since merging between
> + * GPU VA regions would result into unmap and map operations to be issued for
> + * both regions involved although the original mapping request was referred to
> + * one specific GPU VA region only. Since the other GPU VA region, the one not
> + * explicitly requested to be altered, might be in use by the GPU, we are not
> + * allowed to issue any map/unmap operations for this region.
> + *
> + * Note that before calling drm_gpuva_sm_map_ops_create() again with another
> + * mapping request it is necessary to update the &drm_gpuva_manager's view of
> + * the GPU VA space. The previously obtained operations must be either fully
> + * processed or completely abandoned.
> + *
> + * To update the &drm_gpuva_manager's view of the GPU VA space
> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
> + * drm_gpuva_destroy_unlocked() should be used.
> + *
> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
> + * provides drivers a the list of operations to be executed in order to unmap
> + * a range of GPU VA space. The logic behind this functions is way simpler
> + * though: For all existent mappings enclosed by the given range unmap
> + * operations are created. For mappings which are only partically located within
> + * the given range, remap operations are created such that those mappings are
> + * split up and re-mapped partically.
> + *
> + * The following paragraph depicts the basic constellations of existent GPU VA
> + * mappings, a newly requested mapping and the resulting mappings as implemented
> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
> + * of those constellations.
> + *
> + * ::
> + *
> + *	1) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	2) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     a     1
> + *	req: |-----------| (bo_offset=m)
> + *
> + *	     0     a     1
> + *	new: |-----------| (bo_offset=m)
> + *
> + *
> + *	3) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     b     1
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	4) Existent mapping is replaced.
> + *	--------------------------------
> + *
> + *	     0  a  1
> + *	old: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or bo_offset.
> + *
> + *
> + *	5) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  b  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0  b  1  a' 2
> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	6) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	     0  a  1
> + *	req: |-----|       (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	7) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  b  2
> + *	req:       |-----| (bo_offset=m)
> + *
> + *	     0  a  1  b  2
> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *	8) Existent mapping is kept.
> + *	----------------------------
> + *
> + *	      0     a     2
> + *	old: |-----------| (bo_offset=n)
> + *
> + *	           1  a  2
> + *	req:       |-----| (bo_offset=n+1)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *
> + *	9) Existent mapping is split.
> + *	-----------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------|       (bo_offset=n)
> + *
> + *	           1     b     3
> + *	req:       |-----------| (bo_offset=m)
> + *
> + *	     0  a  1     b     3
> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> + *
> + *
> + *	10) Existent mapping is merged.
> + *	-------------------------------
> + *
> + *	     0     a     2
> + *	old: |-----------|       (bo_offset=n)
> + *
> + *	           1     a     3
> + *	req:       |-----------| (bo_offset=n+1)
> + *
> + *	     0        a        3
> + *	new: |-----------------| (bo_offset=n)
> + *
> + *
> + *	11) Existent mapping is split.
> + *	------------------------------
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *	           1  b  2
> + *	req:       |-----|       (bo_offset=m)
> + *
> + *	     0  a  1  b  2  a' 3
> + *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
> + *
> + *
> + *	12) Existent mapping is kept.
> + *	-----------------------------
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *	           1  a  2
> + *	req:       |-----|       (bo_offset=n+1)
> + *
> + *	     0        a        3
> + *	old: |-----------------| (bo_offset=n)
> + *
> + *
> + *	13) Existent mapping is replaced.
> + *	---------------------------------
> + *
> + *	           1  a  2
> + *	old:       |-----| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	req: |-----------| (bo_offset=n)
> + *
> + *	     0     a     2
> + *	new: |-----------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	14) Existent mapping is replaced.
> + *	---------------------------------
> + *
> + *	           1  a  2
> + *	old:       |-----| (bo_offset=n)
> + *
> + *	     0        a       3
> + *	req: |----------------| (bo_offset=n)
> + *
> + *	     0        a       3
> + *	new: |----------------| (bo_offset=n)
> + *
> + *	Note: We expect to see the same result for a request with a different bo
> + *	      and/or non-contiguous bo_offset.
> + *
> + *
> + *	15) Existent mapping is split.
> + *	------------------------------
> + *
> + *	           1     a     3
> + *	old:       |-----------| (bo_offset=n)
> + *
> + *	     0     b     2
> + *	req: |-----------|       (bo_offset=m)
> + *
> + *	     0     b     2  a' 3
> + *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
> + *
> + *
> + *	16) Existent mappings are merged.
> + *	---------------------------------
> + *
> + *	     0     a     1
> + *	old: |-----------|                        (bo_offset=n)
> + *
> + *	                            2     a     3
> + *	old':                       |-----------| (bo_offset=n+2)
> + *
> + *	                1     a     2
> + *	req:            |-----------|             (bo_offset=n+1)
> + *
> + *	                      a
> + *	new: |----------------------------------| (bo_offset=n)
> + */
> +
> +/**
> + * DOC: Locking
> + *
> + * Generally, the GPU VA manager does not take care of locking itself, it is
> + * the drivers responsibility to take care about locking. Drivers might want to
> + * protect the following operations: inserting, destroying and iterating
> + * &drm_gpuva and &drm_gpuva_region objects as well as generating split and merge
> + * operations.
> + *
> + * The GPU VA manager does take care of the locking of the backing
> + * &drm_gem_object buffers GPU VA lists though, unless the provided functions
> + * documentation claims otherwise.
> + */
> +
> +/**
> + * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
> + * @mgr: pointer to the &drm_gpuva_manager to initialize
> + * @name: the name of the GPU VA space
> + * @start_offset: the start offset of the GPU VA space
> + * @range: the size of the GPU VA space
> + * @reserve_offset: the start of the kernel reserved GPU VA area
> + * @reserve_range: the size of the kernel reserved GPU VA area
> + *
> + * The &drm_gpuva_manager must be initialized with this function before use.
> + *
> + * Note that @mgr must be cleared to 0 before calling this function. The given
> + * &name is expected to be managed by the surrounding driver structures.
> + */
> +void
> +drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +		       const char *name,
> +		       u64 start_offset, u64 range,
> +		       u64 reserve_offset, u64 reserve_range)
> +{
> +	drm_mm_init(&mgr->va_mm, start_offset, range);
> +	drm_mm_init(&mgr->region_mm, start_offset, range);
> +
> +	mgr->mm_start = start_offset;
> +	mgr->mm_range = range;
> +
> +	mgr->name = name ? name : "unknown";
> +
> +	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
> +	mgr->kernel_alloc_node.start = reserve_offset;
> +	mgr->kernel_alloc_node.size = reserve_range;
> +	drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_init);
> +
> +/**
> + * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
> + * @mgr: pointer to the &drm_gpuva_manager to clean up
> + *
> + * Note that it is a bug to call this function on a manager that still
> + * holds GPU VA mappings.
> + */
> +void
> +drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
> +{
> +	mgr->name = NULL;
> +	drm_mm_remove_node(&mgr->kernel_alloc_node);
> +	drm_mm_takedown(&mgr->va_mm);
> +	drm_mm_takedown(&mgr->region_mm);
> +}
> +EXPORT_SYMBOL(drm_gpuva_manager_destroy);
> +
> +static struct drm_gpuva_region *
> +drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	struct drm_gpuva_region *reg;
> +
> +	/* Find the VA region the requested range is strictly enclosed by. */
> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
> +		if (reg->node.start <= addr &&
> +		    reg->node.start + reg->node.size >= addr + range &&
> +		    &reg->node != &mgr->kernel_alloc_node)
> +			return reg;
> +	}
> +
> +	return NULL;
> +}
> +
> +static bool
> +drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
> +{
> +	return !!drm_gpuva_in_region(mgr, addr, range);
> +}
> +
> +/**
> + * drm_gpuva_insert - insert a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @va: the &drm_gpuva to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * The function assumes the caller does not hold the &drm_gem_object's
> + * GPU VA list mutex.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		 struct drm_gpuva *va,
> +		 u64 addr, u64 range)
> +{
> +	struct drm_gpuva_region *reg;
> +	int ret;
> +
> +	if (!va->gem.obj)
> +		return -EINVAL;
> +
> +	reg = drm_gpuva_in_region(mgr, addr, range);
> +	if (!reg)
> +		return -EINVAL;
> +
> +	ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
> +					  range, 0,
> +					  0, addr,
> +					  addr + range,
> +					  DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
> +	if (ret)
> +		return ret;
> +
> +	va->mgr = mgr;
> +	va->region = reg;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_insert);
> +
> +/**
> + * drm_gpuva_link_locked - link a &drm_gpuva
> + * @va: the &drm_gpuva to link
> + *
> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * The function assumes the caller already holds the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_link_locked(struct drm_gpuva *va)
> +{
> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
> +	list_add_tail(&va->head, &va->gem.obj->gpuva.list);
> +}
> +EXPORT_SYMBOL(drm_gpuva_link_locked);
> +
> +/**
> + * drm_gpuva_link_unlocked - unlink a &drm_gpuva
> + * @va: the &drm_gpuva to unlink
> + *
> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * The function assumes the caller does not hold the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_link_unlocked(struct drm_gpuva *va)
> +{
> +	drm_gem_gpuva_lock(va->gem.obj);
> +	drm_gpuva_link_locked(va);
> +	drm_gem_gpuva_unlock(va->gem.obj);
> +}
> +EXPORT_SYMBOL(drm_gpuva_link_unlocked);
> +
> +/**
> + * drm_gpuva_unlink_locked - unlink a &drm_gpuva
> + * @va: the &drm_gpuva to unlink
> + *
> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * The function assumes the caller already holds the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_unlink_locked(struct drm_gpuva *va)
> +{
> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
> +	list_del_init(&va->head);
> +}
> +EXPORT_SYMBOL(drm_gpuva_unlink_locked);
> +
> +/**
> + * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
> + * @va: the &drm_gpuva to unlink
> + *
> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
> + * associated with.
> + *
> + * The function assumes the caller does not hold the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
> +{
> +	drm_gem_gpuva_lock(va->gem.obj);
> +	drm_gpuva_unlink_locked(va);
> +	drm_gem_gpuva_unlock(va->gem.obj);
> +}
> +EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
> +
> +/**
> + * drm_gpuva_destroy_locked - destroy a &drm_gpuva
> + * @va: the &drm_gpuva to destroy
> + *
> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
> + * associated with and removes it from the underlaying range allocator.
> + *
> + * The function assumes the caller already holds the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_destroy_locked(struct drm_gpuva *va)
> +{
> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
> +
> +	list_del(&va->head);
> +	drm_mm_remove_node(&va->node);
> +}
> +EXPORT_SYMBOL(drm_gpuva_destroy_locked);
> +
> +/**
> + * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
> + * @va: the &drm_gpuva to destroy
> + *
> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
> + * associated with and removes it from the underlaying range allocator.
> + *
> + * The function assumes the caller does not hold the &drm_gem_object's
> + * GPU VA list mutex.
> + */
> +void
> +drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
> +{
> +	drm_gem_gpuva_lock(va->gem.obj);
> +	list_del(&va->head);
> +	drm_gem_gpuva_unlock(va->gem.obj);
> +
> +	drm_mm_remove_node(&va->node);
> +}
> +EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
> +
> +/**
> + * drm_gpuva_find - find a &drm_gpuva
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuvas address
> + * @range: the &drm_gpuvas range
> + *
> + * Returns: the &drm_gpuva at a given &addr and with a given &range
> + */
> +struct drm_gpuva *
> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +	       u64 addr, u64 range)
> +{
> +	struct drm_gpuva *va;
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
> +		if (va->node.start == addr &&
> +		    va->node.size == range)
> +			return va;
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find);
> +
> +/**
> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @start: the given GPU VA's start address
> + *
> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> +{
> +	struct drm_mm_node *node;
> +
> +	if (start <= mgr->mm_start ||
> +	    start > (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
> +	if (node == &mgr->va_mm.head_node)
> +		return NULL;
> +
> +	return (struct drm_gpuva *)node;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_prev);
> +
> +/**
> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
> + * @mgr: the &drm_gpuva_manager to search in
> + * @end: the given GPU VA's end address
> + *
> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
> + *
> + * Note that if there is any free space between the GPU VA mappings no mapping
> + * is returned.
> + *
> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> + */
> +struct drm_gpuva *
> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
> +{
> +	struct drm_mm_node *node;
> +
> +	if (end < mgr->mm_start ||
> +	    end >= (mgr->mm_start + mgr->mm_range))
> +		return NULL;
> +
> +	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
> +	if (node == &mgr->va_mm.head_node)
> +		return NULL;
> +
> +	return (struct drm_gpuva *)node;
> +}
> +EXPORT_SYMBOL(drm_gpuva_find_next);
> +
> +/**
> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
> + * @reg: the &drm_gpuva_region to insert
> + * @addr: the start address of the GPU VA
> + * @range: the range of the GPU VA
> + *
> + * Insert a &drm_gpuva_region with a given address and range into a
> + * &drm_gpuva_manager.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			struct drm_gpuva_region *reg,
> +			u64 addr, u64 range)
> +{
> +	int ret;
> +
> +	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
> +					  range, 0,
> +					  0, addr,
> +					  addr + range,
> +					  DRM_MM_INSERT_LOW|
> +					  DRM_MM_INSERT_ONCE);
> +	if (ret)
> +		return ret;
> +
> +	reg->mgr = mgr;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_insert);
> +
> +/**
> + * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager holding the region
> + * @reg: the &drm_gpuva to destroy
> + *
> + * This removes the given &reg from the underlaying range allocator.
> + */
> +void
> +drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
> +			 struct drm_gpuva_region *reg)
> +{
> +	struct drm_gpuva *va;
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr,
> +				       reg->node.start,
> +				       reg->node.size) {
> +		WARN(1, "GPU VA region must be empty on destroy.\n");
> +		return;
> +	}
> +
> +	if (&reg->node == &mgr->kernel_alloc_node) {
> +		WARN(1, "Can't destroy kernel reserved region.\n");
> +		return;
> +	}
> +
> +	drm_mm_remove_node(&reg->node);
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_destroy);
> +
> +/**
> + * drm_gpuva_region_find - find a &drm_gpuva_region
> + * @mgr: the &drm_gpuva_manager to search in
> + * @addr: the &drm_gpuva_regions address
> + * @range: the &drm_gpuva_regions range
> + *
> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
> + */
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range)
> +{
> +	struct drm_gpuva_region *reg;
> +
> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
> +		if (reg->node.start == addr &&
> +		    reg->node.size == range)
> +			return reg;
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL(drm_gpuva_region_find);
> +
> +static int
> +gpuva_op_map_new(struct drm_gpuva_op **pop,
> +		 u64 addr, u64 range,
> +		 struct drm_gem_object *obj, u64 offset)
> +{
> +	struct drm_gpuva_op *op;
> +
> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	op->op = DRM_GPUVA_OP_MAP;
> +	op->map.va.addr = addr;
> +	op->map.va.range = range;
> +	op->map.gem.obj = obj;
> +	op->map.gem.offset = offset;
> +
> +	return 0;
> +}
> +
> +static int
> +gpuva_op_remap_new(struct drm_gpuva_op **pop,
> +		   struct drm_gpuva_op_map *prev,
> +		   struct drm_gpuva_op_map *next,
> +		   struct drm_gpuva_op_unmap *unmap)
> +{
> +	struct drm_gpuva_op *op;
> +	struct drm_gpuva_op_remap *r;
> +
> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	op->op = DRM_GPUVA_OP_REMAP;
> +	r = &op->remap;
> +
> +	if (prev) {
> +		r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
> +		if (!r->prev)
> +			goto err_free_op;
> +	}
> +
> +	if (next) {
> +		r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
> +		if (!r->next)
> +			goto err_free_prev;
> +	}
> +
> +	r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
> +	if (!r->unmap)
> +		goto err_free_next;
> +
> +	return 0;
> +
> +err_free_next:
> +	if (next)
> +		kfree(r->next);
> +err_free_prev:
> +	if (prev)
> +		kfree(r->prev);
> +err_free_op:
> +	kfree(op);
> +	*pop = NULL;
> +
> +	return -ENOMEM;
> +}
> +
> +static int
> +gpuva_op_unmap_new(struct drm_gpuva_op **pop,
> +		   struct drm_gpuva *va, bool merge)
> +{
> +	struct drm_gpuva_op *op;
> +
> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	op->op = DRM_GPUVA_OP_UNMAP;
> +	op->unmap.va = va;
> +	op->unmap.keep = merge;
> +
> +	return 0;
> +}
> +
> +#define op_map_new_to_list(_ops, _addr, _range,		\
> +			   _obj, _offset)		\
> +do {							\
> +	struct drm_gpuva_op *op;			\
> +							\
> +	ret = gpuva_op_map_new(&op, _addr, _range,	\
> +			       _obj, _offset);		\
> +	if (ret)					\
> +		goto err_free_ops;			\
> +							\
> +	list_add_tail(&op->entry, _ops);		\
> +} while (0)
> +
> +#define op_remap_new_to_list(_ops, _prev, _next,	\
> +			     _unmap)			\
> +do {							\
> +	struct drm_gpuva_op *op;			\
> +							\
> +	ret = gpuva_op_remap_new(&op, _prev, _next,	\
> +				 _unmap);		\
> +	if (ret)					\
> +		goto err_free_ops;			\
> +							\
> +	list_add_tail(&op->entry, _ops);		\
> +} while (0)
> +
> +#define op_unmap_new_to_list(_ops, _gpuva, _merge)	\
> +do {							\
> +	struct drm_gpuva_op *op;			\
> +							\
> +	ret = gpuva_op_unmap_new(&op, _gpuva, _merge);	\
> +	if (ret)					\
> +		goto err_free_ops;			\
> +							\
> +	list_add_tail(&op->entry, _ops);		\
> +} while (0)
> +
> +/**
> + * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the new mapping
> + * @req_range: the range of the new mapping
> + * @req_obj: the &drm_gem_object to map
> + * @req_offset: the offset within the &drm_gem_object
> + *
> + * This function creates a list of operations to perform splitting and merging
> + * of existent mapping(s) with the newly requested one.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
> + * in the given order. It can contain map, unmap and remap operations, but it
> + * also can be empty if no operation is required, e.g. if the requested mapping
> + * already exists is the exact same way.
> + *
> + * There can be an arbitrary amount of unmap operations, a maximum of two remap
> + * operations and a single map operation. The latter one, if existent,
> + * represents the original map operation requested by the caller. Please note
> + * that the map operation might has been modified, e.g. if it was
> + * merged with an existent mapping.
> + *
> + * Note that before calling this function again with another mapping request it
> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
> + * The previously obtained operations must be either processed or abandoned.
> + * To update the &drm_gpuva_manager's view of the GPU VA space
> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
> + * drm_gpuva_destroy_unlocked() should be used.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
> +			    u64 req_addr, u64 req_range,
> +			    struct drm_gem_object *req_obj, u64 req_offset)
> +{
> +	struct drm_gpuva_ops *ops;
> +	struct drm_gpuva *va, *prev = NULL;
> +	u64 req_end = req_addr + req_range;
> +	bool skip_pmerge = false, skip_nmerge = false;
> +	int ret;
> +
> +	if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
> +		return ERR_PTR(-EINVAL);
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (!ops)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->node.start;
> +		u64 range = va->node.size;
> +		u64 end = addr + range;
> +
> +		/* Generally, we want to skip merging with potential mappings
> +		 * left and right of the requested one when we found a
> +		 * collision, since merging happens in this loop already.
> +		 *
> +		 * However, there is one exception when the requested mapping
> +		 * spans into a free VM area. If this is the case we might
> +		 * still hit the boundary of another mapping before and/or
> +		 * after the free VM area.
> +		 */
> +		skip_pmerge = true;
> +		skip_nmerge = true;
> +
> +		if (addr == req_addr) {
> +			bool merge = obj == req_obj &&
> +				     offset == req_offset;
> +			if (end == req_end) {
> +				if (merge)
> +					goto done;
> +
> +				op_unmap_new_to_list(&ops->list, va, false);
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				skip_nmerge = false;
> +				op_unmap_new_to_list(&ops->list, va, merge);
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = range - req_range,
> +					.gem.obj = obj,
> +					.gem.offset = offset + req_range,
> +				};
> +				struct drm_gpuva_op_unmap u = { .va = va };
> +
> +				if (merge)
> +					goto done;
> +
> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
> +				break;
> +			}
> +		} else if (addr < req_addr) {
> +			u64 ls_range = req_addr - addr;
> +			struct drm_gpuva_op_map p = {
> +				.va.addr = addr,
> +				.va.range = ls_range,
> +				.gem.obj = obj,
> +				.gem.offset = offset,
> +			};
> +			struct drm_gpuva_op_unmap u = { .va = va };
> +			bool merge = obj == req_obj &&
> +				     offset + ls_range == req_offset;
> +
> +			if (end == req_end) {
> +				if (merge)
> +					goto done;
> +
> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				u64 new_addr = addr;
> +				u64 new_range = req_range + ls_range;
> +				u64 new_offset = offset;
> +
> +				/* We validated that the requested mapping is
> +				 * within a single VA region already.
> +				 * Since it overlaps the current mapping (which
> +				 * can't cross a VA region boundary) we can be
> +				 * sure that we're still within the boundaries
> +				 * of the same VA region after merging.
> +				 */
> +				if (merge) {
> +					req_offset = new_offset;
> +					req_addr = new_addr;
> +					req_range = new_range;
> +					op_unmap_new_to_list(&ops->list, va, true);
> +					goto next;
> +				}
> +
> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = end - req_end,
> +					.gem.obj = obj,
> +					.gem.offset = offset + ls_range +
> +						      req_range,
> +				};
> +
> +				if (merge)
> +					goto done;
> +
> +				op_remap_new_to_list(&ops->list, &p, &n, &u);
> +				break;
> +			}
> +		} else if (addr > req_addr) {
> +			bool merge = obj == req_obj &&
> +				     offset == req_offset +
> +					       (addr - req_addr);
> +			if (!prev)
> +				skip_pmerge = false;
> +
> +			if (end == req_end) {
> +				op_unmap_new_to_list(&ops->list, va, merge);
> +				break;
> +			}
> +
> +			if (end < req_end) {
> +				skip_nmerge = false;
> +				op_unmap_new_to_list(&ops->list, va, merge);
> +				goto next;
> +			}
> +
> +			if (end > req_end) {
> +				struct drm_gpuva_op_map n = {
> +					.va.addr = req_end,
> +					.va.range = end - req_end,
> +					.gem.obj = obj,
> +					.gem.offset = offset + req_end - addr,
> +				};
> +				struct drm_gpuva_op_unmap u = { .va = va };
> +				u64 new_end = end;
> +				u64 new_range = new_end - req_addr;
> +
> +				/* We validated that the requested mapping is
> +				 * within a single VA region already.
> +				 * Since it overlaps the current mapping (which
> +				 * can't cross a VA region boundary) we can be
> +				 * sure that we're still within the boundaries
> +				 * of the same VA region after merging.
> +				 */
> +				if (merge) {
> +					req_end = new_end;
> +					req_range = new_range;
> +					op_unmap_new_to_list(&ops->list, va, true);
> +					break;
> +				}
> +
> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
> +				break;
> +			}
> +		}
> +next:
> +		prev = va;
> +	}
> +
> +	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
> +	if (va) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->node.start;
> +		u64 range = va->node.size;
> +		u64 new_offset = offset;
> +		u64 new_addr = addr;
> +		u64 new_range = req_range + range;
> +		bool merge = obj == req_obj &&
> +			     offset + range == req_offset;
> +
> +		/* Don't merge over VA region boundaries. */
> +		merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
> +		if (merge) {
> +			op_unmap_new_to_list(&ops->list, va, true);
> +
> +			req_offset = new_offset;
> +			req_addr = new_addr;
> +			req_range = new_range;
> +		}
> +	}
> +
> +	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
> +	if (va) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->node.start;
> +		u64 range = va->node.size;
> +		u64 end = addr + range;
> +		u64 new_range = req_range + range;
> +		u64 new_end = end;
> +		bool merge = obj == req_obj &&
> +			     offset == req_offset + req_range;
> +
> +		/* Don't merge over VA region boundaries. */
> +		merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
> +		if (merge) {
> +			op_unmap_new_to_list(&ops->list, va, true);
> +
> +			req_range = new_range;
> +			req_end = new_end;
> +		}
> +	}
> +
> +	op_map_new_to_list(&ops->list,
> +			   req_addr, req_range,
> +			   req_obj, req_offset);
> +
> +done:
> +	return ops;
> +
> +err_free_ops:
> +	drm_gpuva_ops_free(ops);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
> +
> +#undef op_map_new_to_list
> +#undef op_remap_new_to_list
> +#undef op_unmap_new_to_list
> +
> +/**
> + * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
> + * @req_addr: the start address of the range to unmap
> + * @req_range: the range of the mappings to unmap
> + *
> + * This function creates a list of operations to perform unmapping and, if
> + * required, splitting of the mappings overlapping the unmap range.
> + *
> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
> + * in the given order. It can contain unmap and remap operations, depending on
> + * whether there are actual overlapping mappings to split.
> + *
> + * There can be an arbitrary amount of unmap operations and a maximum of two
> + * remap operations.
> + *
> + * Note that before calling this function again with another range to unmap it
> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
> + * The previously obtained operations must be processed or abandoned.
> + * To update the &drm_gpuva_manager's view of the GPU VA space
> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
> + * drm_gpuva_destroy_unlocked() should be used.
> + *
> + * After the caller finished processing the returned &drm_gpuva_ops, they must
> + * be freed with &drm_gpuva_ops_free.
> + *
> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
> + */
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 req_addr, u64 req_range)
> +{
> +	struct drm_gpuva_ops *ops;
> +	struct drm_gpuva_op *op;
> +	struct drm_gpuva_op_remap *r;
> +	struct drm_gpuva *va;
> +	u64 req_end = req_addr + req_range;
> +	int ret;
> +
> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	if (!ops)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_LIST_HEAD(&ops->list);
> +
> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +		u64 offset = va->gem.offset;
> +		u64 addr = va->node.start;
> +		u64 range = va->node.size;
> +		u64 end = addr + range;
> +
> +		op = kzalloc(sizeof(*op), GFP_KERNEL);
> +		if (!op) {
> +			ret = -ENOMEM;
> +			goto err_free_ops;
> +		}
> +
> +		r = &op->remap;
> +
> +		if (addr < req_addr) {
> +			r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
> +			if (!r->prev) {
> +				ret = -ENOMEM;
> +				goto err_free_op;
> +			}
> +
> +			r->prev->va.addr = addr;
> +			r->prev->va.range = req_addr - addr;
> +			r->prev->gem.obj = obj;
> +			r->prev->gem.offset = offset;
> +		}
> +
> +		if (end > req_end) {
> +			r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
> +			if (!r->next) {
> +				ret = -ENOMEM;
> +				goto err_free_prev;
> +			}
> +
> +			r->next->va.addr = req_end;
> +			r->next->va.range = end - req_end;
> +			r->next->gem.obj = obj;
> +			r->next->gem.offset = offset + (req_end - addr);
> +		}
> +
> +		if (op->remap.prev || op->remap.next) {
> +			op->op = DRM_GPUVA_OP_REMAP;
> +			r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
> +			if (!r->unmap) {
> +				ret = -ENOMEM;
> +				goto err_free_next;
> +			}
> +
> +			r->unmap->va = va;
> +		} else {
> +			op->op = DRM_GPUVA_OP_UNMAP;
> +			op->unmap.va = va;
> +		}
> +
> +		list_add_tail(&op->entry, &ops->list);
> +	}
> +
> +	return ops;
> +
> +err_free_next:
> +	if (r->next)
> +		kfree(r->next);
> +err_free_prev:
> +	if (r->prev)
> +		kfree(r->prev);
> +err_free_op:
> +	kfree(op);
> +err_free_ops:
> +	drm_gpuva_ops_free(ops);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
> +
> +/**
> + * drm_gpuva_ops_free - free the given &drm_gpuva_ops
> + * @ops: the &drm_gpuva_ops to free
> + *
> + * Frees the given &drm_gpuva_ops structure including all the ops associated
> + * with it.
> + */
> +void
> +drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
> +{
> +	struct drm_gpuva_op *op, *next;
> +
> +	drm_gpuva_for_each_op_safe(op, next, ops) {
> +		list_del(&op->entry);
> +		if (op->op == DRM_GPUVA_OP_REMAP) {
> +			if (op->remap.prev)
> +				kfree(op->remap.prev);
> +
> +			if (op->remap.next)
> +				kfree(op->remap.next);
> +
> +			kfree(op->remap.unmap);
> +		}
> +		kfree(op);
> +	}
> +
> +	kfree(ops);
> +}
> +EXPORT_SYMBOL(drm_gpuva_ops_free);
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index d7c521e8860f..6feacd93aca6 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -104,6 +104,12 @@ enum drm_driver_feature {
>  	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
>  	 */
>  	DRIVER_COMPUTE_ACCEL            = BIT(7),
> +	/**
> +	 * @DRIVER_GEM_GPUVA:
> +	 *
> +	 * Driver supports user defined GPU VA bindings for GEM objects.
> +	 */
> +	DRIVER_GEM_GPUVA		= BIT(8),
>  
>  	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
>  
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 772a4adf5287..4a3679034966 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -36,6 +36,8 @@
>  
>  #include <linux/kref.h>
>  #include <linux/dma-resv.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
>  
>  #include <drm/drm_vma_manager.h>
>  
> @@ -337,6 +339,17 @@ struct drm_gem_object {
>  	 */
>  	struct dma_resv _resv;
>  
> +	/**
> +	 * @gpuva:
> +	 *
> +	 * Provides the list and list mutex of GPU VAs attached to this
> +	 * GEM object.
> +	 */
> +	struct {
> +		struct list_head list;
> +		struct mutex mutex;
> +	} gpuva;
> +
>  	/**
>  	 * @funcs:
>  	 *
> @@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>  unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
>  			       bool (*shrink)(struct drm_gem_object *obj));
>  
> +/**
> + * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
> + * @obj: the &drm_gem_object
> + *
> + * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
> + * protecting it.
> + *
> + * Calling this function is only necessary for drivers intending to support the
> + * &drm_driver_feature DRIVER_GEM_GPUVA.
> + */
> +static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
> +{
> +	INIT_LIST_HEAD(&obj->gpuva.list);
> +	mutex_init(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
> + * @obj: the &drm_gem_object
> + *
> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
> + */
> +static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
> +{
> +	mutex_lock(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
> + * @obj: the &drm_gem_object
> + *
> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
> + */
> +static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
> +{
> +	mutex_unlock(&obj->gpuva.mutex);
> +}
> +
> +/**
> + * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gem_for_each_gpuva(entry, obj) \
> +	list_for_each_entry(entry, &obj->gpuva.list, head)
> +
> +/**
> + * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @next: &next &drm_gpuva to store the next step
> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
> + * it is save against removal of elements.
> + */
> +#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
> +	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
> +
>  #endif /* __DRM_GEM_H__ */
> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
> new file mode 100644
> index 000000000000..adeb0c916e91
> --- /dev/null
> +++ b/include/drm/drm_gpuva_mgr.h
> @@ -0,0 +1,527 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#ifndef __DRM_GPUVA_MGR_H__
> +#define __DRM_GPUVA_MGR_H__
> +
> +/*
> + * Copyright (c) 2022 Red Hat.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <drm/drm_mm.h>
> +#include <linux/mm.h>
> +#include <linux/rbtree.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +struct drm_gpuva_region;
> +struct drm_gpuva;
> +struct drm_gpuva_ops;
> +
> +/**
> + * struct drm_gpuva_manager - DRM GPU VA Manager
> + *
> + * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
> + * the &drm_mm range allocator. Typically, this structure is embedded in bigger
> + * driver structures.
> + *
> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
> + * pages.
> + *
> + * There should be one manager instance per GPU virtual address space.
> + */
> +struct drm_gpuva_manager {
> +	/**
> +	 * @name: the name of the DRM GPU VA space
> +	 */
> +	const char *name;
> +
> +	/**
> +	 * @mm_start: start of the VA space
> +	 */
> +	u64 mm_start;
> +
> +	/**
> +	 * @mm_range: length of the VA space
> +	 */
> +	u64 mm_range;
> +
> +	/**
> +	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
> +	 */
> +	struct drm_mm region_mm;
> +
> +	/**
> +	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
> +	 */
> +	struct drm_mm va_mm;
> +
> +	/**
> +	 * @kernel_alloc_node:
> +	 *
> +	 * &drm_mm_node representing the address space cutout reserved for
> +	 * the kernel
> +	 */
> +	struct drm_mm_node kernel_alloc_node;
> +};
> +
> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
> +			    const char *name,
> +			    u64 start_offset, u64 range,
> +			    u64 reserve_offset, u64 reserve_range);
> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
> +
> +/**
> + * struct drm_gpuva_region - structure to track a portion of GPU VA space
> + *
> + * This structure represents a portion of a GPUs VA space and is associated
> + * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
> + *
> + * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
> + * placed within a &drm_gpuva_region.
> + */
> +struct drm_gpuva_region {
> +	/**
> +	 * @node: the &drm_mm_node to track the GPU VA region
> +	 */
> +	struct drm_mm_node node;
> +
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @sparse: indicates whether this region is sparse
> +	 */
> +	bool sparse;
> +};
> +
> +struct drm_gpuva_region *
> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
> +		      u64 addr, u64 range);
> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
> +			    struct drm_gpuva_region *reg,
> +			    u64 addr, u64 range);
> +void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
> +			      struct drm_gpuva_region *reg);
> +
> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
> +		     struct drm_gpuva *va,
> +		     u64 addr, u64 range);
> +/**
> + * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
> + * @node__: &drm_gpuva_region structure to assign to in each iteration step
> + * @gpuva__: &drm_gpuva_manager structure to walk
> + * @start__: starting offset, the first node will overlap this
> + * @end__: ending offset, the last node will start before this (but may overlap)
> + *
> + * This iterator walks over all nodes in the range allocator that lie
> + * between @start and @end. It is implemented similarly to list_for_each(),
> + * but is using &drm_mm's internal interval tree to accelerate the search for
> + * the starting node, and hence isn't safe against removal of elements. It
> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
> + * backing &drm_mm, and may even continue indefinitely.
> + */
> +#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
> +	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
> +									 (start__), (end__)-1); \
> +	     node__->node.start < (end__); \
> +	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
> +
> +/**
> + * drm_gpuva_for_each_region - iternator to walk over a range of nodes
> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva_region structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gpuva_for_each_region(entry, gpuva) \
> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
> +
> +/**
> + * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
> + * nodes
> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
> + * @next: &next &drm_gpuva_region to store the next step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva_region structures associated with the
> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
> + * against removal of elements.
> + */
> +#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
> +
> +
> +/**
> + * enum drm_gpuva_flags - flags for struct drm_gpuva
> + */
> +enum drm_gpuva_flags {
> +	/**
> +	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
> +	 */
> +	DRM_GPUVA_SWAPPED = (1 << 0),
> +};
> +
> +/**
> + * struct drm_gpuva - structure to track a GPU VA mapping
> + *
> + * This structure represents a GPU VA mapping and is associated with a
> + * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
> + *
> + * Typically, this structure is embedded in bigger driver structures.
> + */
> +struct drm_gpuva {
> +	/**
> +	 * @node: the &drm_mm_node to track the GPU VA mapping
> +	 */
> +	struct drm_mm_node node;
> +
> +	/**
> +	 * @mgr: the &drm_gpuva_manager this object is associated with
> +	 */
> +	struct drm_gpuva_manager *mgr;
> +
> +	/**
> +	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
> +	 */
> +	struct drm_gpuva_region *region;
> +
> +	/**
> +	 * @head: the &list_head to attach this object to a &drm_gem_object
> +	 */
> +	struct list_head head;
> +
> +	/**
> +	 * @flags: the &drm_gpuva_flags for this mapping
> +	 */
> +	enum drm_gpuva_flags flags;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the mapped &drm_gem_object
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +void drm_gpuva_link_locked(struct drm_gpuva *va);
> +void drm_gpuva_link_unlocked(struct drm_gpuva *va);
> +void drm_gpuva_unlink_locked(struct drm_gpuva *va);
> +void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
> +
> +void drm_gpuva_destroy_locked(struct drm_gpuva *va);
> +void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
> +
> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
> +				 u64 addr, u64 range);
> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
> +
> +/**
> + * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
> + * @va: the &drm_gpuva to set the swap flag of
> + * @swap: indicates whether the &drm_gpuva is swapped
> + */
> +static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
> +{
> +	if (swap)
> +		va->flags |= DRM_GPUVA_SWAPPED;
> +	else
> +		va->flags &= ~DRM_GPUVA_SWAPPED;
> +}
> +
> +/**
> + * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
> + * is swapped
> + * @va: the &drm_gpuva to check
> + */
> +static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
> +{
> +	return va->flags & DRM_GPUVA_SWAPPED;
> +}
> +
> +/**
> + * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
> + * @node__: &drm_gpuva structure to assign to in each iteration step
> + * @gpuva__: &drm_gpuva_manager structure to walk
> + * @start__: starting offset, the first node will overlap this
> + * @end__: ending offset, the last node will start before this (but may overlap)
> + *
> + * This iterator walks over all nodes in the range allocator that lie
> + * between @start and @end. It is implemented similarly to list_for_each(),
> + * but is using &drm_mm's internal interval tree to accelerate the search for
> + * the starting node, and hence isn't safe against removal of elements. It
> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
> + * backing &drm_mm, and may even continue indefinitely.
> + */
> +#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
> +	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
> +								  (start__), (end__)-1); \
> +	     node__->node.start < (end__); \
> +	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
> +
> +/**
> + * drm_gpuva_for_each_va - iternator to walk over a range of nodes
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager.
> + */
> +#define drm_gpuva_for_each_va(entry, gpuva) \
> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
> +
> +/**
> + * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
> + * nodes
> + * @entry: &drm_gpuva structure to assign to in each iteration step
> + * @next: &next &drm_gpuva to store the next step
> + * @gpuva: &drm_gpuva_manager structure to walk
> + *
> + * This iterator walks over all &drm_gpuva structures associated with the
> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
> + * against removal of elements.
> + */
> +#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
> +
> +/**
> + * enum drm_gpuva_op_type - GPU VA operation type
> + *
> + * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
> + * can be map, remap or unmap operations.
> + */
> +enum drm_gpuva_op_type {
> +	/**
> +	 * @DRM_GPUVA_OP_MAP: the map op type
> +	 */
> +	DRM_GPUVA_OP_MAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_REMAP: the remap op type
> +	 */
> +	DRM_GPUVA_OP_REMAP,
> +
> +	/**
> +	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
> +	 */
> +	DRM_GPUVA_OP_UNMAP,
> +};
> +
> +/**
> + * struct drm_gpuva_op_map - GPU VA map operation
> + *
> + * This structure represents a single map operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_map {
> +	/**
> +	 * @va: structure containing address and range of a map
> +	 * operation
> +	 */
> +	struct {
> +		/**
> +		 * @addr: the base address of the new mapping
> +		 */
> +		u64 addr;
> +
> +		/**
> +		 * @range: the range of the new mapping
> +		 */
> +		u64 range;
> +	} va;
> +
> +	/**
> +	 * @gem: structure containing the &drm_gem_object and it's offset
> +	 */
> +	struct {
> +		/**
> +		 * @offset: the offset within the &drm_gem_object
> +		 */
> +		u64 offset;
> +
> +		/**
> +		 * @obj: the &drm_gem_object to map
> +		 */
> +		struct drm_gem_object *obj;
> +	} gem;
> +};
> +
> +/**
> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
> + *
> + * This structure represents a single unmap operation generated by the
> + * DRM GPU VA manager.
> + */
> +struct drm_gpuva_op_unmap {
> +	/**
> +	 * @va: the &drm_gpuva to unmap
> +	 */
> +	struct drm_gpuva *va;
> +
> +	/**
> +	 * @keep:
> +	 *
> +	 * Indicates whether this &drm_gpuva is physically contiguous with the
> +	 * original mapping request.
> +	 *
> +	 * Optionally, if &keep is set, drivers may keep the actual page table
> +	 * mappings for this &drm_gpuva, adding the missing page table entries
> +	 * only and update the &drm_gpuva_manager accordingly.
> +	 */
> +	bool keep;
> +};
> +
> +/**
> + * struct drm_gpuva_op_remap - GPU VA remap operation
> + *
> + * This represents a single remap operation generated by the DRM GPU VA manager.
> + *
> + * A remap operation is generated when an existing GPU VA mmapping is split up
> + * by inserting a new GPU VA mapping or by partially unmapping existent
> + * mapping(s), hence it consists of a maximum of two map and one unmap
> + * operation.
> + *
> + * The @unmap operation takes care of removing the original existing mapping.
> + * @prev is used to remap the preceding part, @next the subsequent part.
> + *
> + * If either a new mapping's start address is aligned with the start address
> + * of the old mapping or the new mapping's end address is aligned with the
> + * end address of the old mapping, either @prev or @next is NULL.
> + *
> + * Note, the reason for a dedicated remap operation, rather than arbitrary
> + * unmap and map operations, is to give drivers the chance of extracting driver
> + * specific data for creating the new mappings from the unmap operations's
> + * &drm_gpuva structure which typically is embedded in larger driver specific
> + * structures.
> + */
> +struct drm_gpuva_op_remap {
> +	/**
> +	 * @prev: the preceding part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *prev;
> +
> +	/**
> +	 * @next: the subsequent part of a split mapping
> +	 */
> +	struct drm_gpuva_op_map *next;
> +
> +	/**
> +	 * @unmap: the unmap operation for the original existing mapping
> +	 */
> +	struct drm_gpuva_op_unmap *unmap;
> +};
> +
> +/**
> + * struct drm_gpuva_op - GPU VA operation
> + *
> + * This structure represents a single generic operation, which can be either
> + * map, unmap or remap.
> + *
> + * The particular type of the operation is defined by @op.
> + */
> +struct drm_gpuva_op {
> +	/**
> +	 * @entry:
> +	 *
> +	 * The &list_head used to distribute instances of this struct within
> +	 * &drm_gpuva_ops.
> +	 */
> +	struct list_head entry;
> +
> +	/**
> +	 * @op: the type of the operation
> +	 */
> +	enum drm_gpuva_op_type op;
> +
> +	union {
> +		/**
> +		 * @map: the map operation
> +		 */
> +		struct drm_gpuva_op_map map;
> +
> +		/**
> +		 * @unmap: the unmap operation
> +		 */
> +		struct drm_gpuva_op_unmap unmap;
> +
> +		/**
> +		 * @remap: the remap operation
> +		 */
> +		struct drm_gpuva_op_remap remap;
> +	};
> +};
> +
> +/**
> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
> + */
> +struct drm_gpuva_ops {
> +	/**
> +	 * @list: the &list_head
> +	 */
> +	struct list_head list;
> +};
> +
> +/**
> + * drm_gpuva_for_each_op - iterator to walk over all ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations.
> + */
> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
> +
> +/**
> + * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
> + * @op: &drm_gpuva_op to assign in each iteration step
> + * @next: &next &drm_gpuva_op to store the next step
> + * @ops: &drm_gpuva_ops to walk
> + *
> + * This iterator walks over all ops within a given list of operations. It is
> + * implemented with list_for_each_safe(), so save against removal of elements.
> + */
> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
> +	list_for_each_entry_safe(op, next, &(ops)->list, entry)
> +
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
> +			    u64 addr, u64 range,
> +			    struct drm_gem_object *obj, u64 offset);
> +struct drm_gpuva_ops *
> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
> +			      u64 addr, u64 range);
> +void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
> +
> +#endif /* __DRM_GPUVA_MGR_H__ */
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-02 18:31                           ` Danilo Krummrich
@ 2023-02-06  9:48                             ` Christian König
  2023-02-06 13:27                               ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-06  9:48 UTC (permalink / raw)
  To: Danilo Krummrich, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

Am 02.02.23 um 19:31 schrieb Danilo Krummrich:
> On 2/2/23 12:53, Christian König wrote:
>> Am 01.02.23 um 09:10 schrieb Dave Airlie:
>>> [SNIP]
>>>>> For drivers that don't intend to merge at all and (somehow) are
>>>>> capable of dealing with sparse regions without knowing the sparse
>>>>> region's boundaries, it'd be easy to make those gpuva_regions 
>>>>> optional.
>>>> Yeah, but this then defeats the approach of having the same hw
>>>> independent interface/implementation for all drivers.
>>> I think you are running a few steps ahead here. The plan isn't to have
>>> an independent interface, it's to provide a set of routines and
>>> tracking that will be consistent across drivers, so that all drivers
>>> once using them will operate in mostly the same fashion with respect
>>> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
>>> amdgpu and freedreno which I think end up operating slightly different
>>> around lifetimes. I'd like to save future driver writers the effort of
>>> dealing with those decisions and this should drive their user api
>>> design so to enable vulkan sparse bindings.
>>
>> Ok in this case I'm pretty sure this is *NOT* a good idea.
>>
>> See this means that we define the UAPI implicitly by saying to 
>> drivers to use a common framework for their VM implementation which 
>> then results in behavior A,B,C,D....
>>
>> If a driver strides away from this common framework because it has 
>> different requirements based on how his hw work you certainly get 
>> different behavior again (and you have tons of hw specific 
>> requirements in here).
>>
>> What we should do instead if we want to have some common handling 
>> among drivers (which I totally agree on makes sense) then we should 
>> define the UAPI explicitly.
>
> By asking that I don't want to say I'm against this idea, I'm just 
> wondering how it becomes easier to deal with "tons of hw specific 
> requirements" by generalizing things even more?

I'm already maintaining two different GPU VM solutions in the GPU 
drivers in the kernel, radeon and amdgpu. The hw they driver is 
identical, just the UAPI is different. And only because of the different 
UAPI they can't have the same VM backend implementation.

The hw stuff is completely abstract able. That's just stuff you need to 
consider when defining the structures you pass around.

But a messed up UAPI is sometimes impossible to fix because of backward 
compatibility.

We learned that the hard way with radeon and mostly fixed it by coming 
up with a completely new implementation for amdgpu.

> What makes us think that we do a better job in considering all hw 
> specific requirements with a unified UAPI than with a more lightweight 
> generic component for tracking VA mappings?

Because this defines the UAPI implicitly and that's seldom a good idea.

As I said before tracking is the easy part of the job. Defining this 
generic component helps a little bit writing new drivers, but it leaves 
way to much room for speculations on the UAPI.

> Also, wouldn't we need something like the GPUVA manager as part of a 
> unified UAPI?

Not necessarily. We can write components to help drivers implement the 
UAPI, but this isn't mandatory.

>
>>
>> For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
>> independent as well as driver dependent information and then has the 
>> documented behavior:
>> a) VAs do (or don't) vanish automatically when the GEM handle is closed.
>> b) GEM BOs do (or don't) get an additional reference for each VM they 
>> are used in.
>> c) Can handle some common use cases driver independent (BO mappings, 
>> readonly, writeonly, sparse etc...).
>> d) Has a well defined behavior when the operation is executed async. 
>> E.g. in/out fences.
>> e) Can still handle hw specific stuff like (for example) trap on 
>> access etc....
>> ...
>>
>> Especially d is what Bas and I have pretty much already created a 
>> prototype for the amdgpu specific IOCTL for, but essentially this is 
>> completely driver independent and actually the more complex stuff. 
>> Compared to that common lifetime of BOs is just nice to have.
>>
>> I strongly think we should concentrate on getting this right as well.
>>
>>> Now if merging is a feature that makes sense to one driver maybe it
>>> makes sense to all, however there may be reasons amdgpu gets away
>>> without merging that other drivers might not benefit from, there might
>>> also be a benefit to amdgpu from merging that you haven't looked at
>>> yet, so I think we could leave merging as an optional extra driver
>>> knob here. The userspace API should operate the same, it would just be
>>> the gpu pagetables that would end up different sizes.
>>
>> Yeah, agree completely. The point is that we should not have 
>> complexity inside the kernel which is not necessarily needed in the 
>> kernel.
>>
>> So merging or not is something we have gone back and forth for 
>> amdgpu, one the one hand it reduces the memory footprint of the 
>> housekeeping overhead on the other hand it makes the handling more 
>> complex, error prone and use a few more CPU cycles.
>>
>> For amdgpu merging is mostly beneficial when you can get rid of a 
>> whole page tables layer in the hierarchy, but for this you need to 
>> merge at least 2MiB or 1GiB together. And since that case doesn't 
>> happen that often we stopped doing it.
>>
>> But for my understanding why you need the ranges for the merging? 
>> Isn't it sufficient to check that the mappings have the same type, 
>> flags, BO, whatever backing them?
>
> Not entirely. Let's assume userspace creates two virtually contiguous 
> buffers (VKBuffer) A and B. Userspace could bind a BO with BO offset 0 
> to A (binding 1) and afterwards bind the same BO with BO offset 
> length(A) to B (binding 2), maybe unlikely but AFAIK not illegal.
>
> If we don't know about the bounds of A and B in the kernel, we detect 
> that both bindings are virtually and physically contiguous and we 
> merge them.

Well as far as I can see this is actually legal and desirable.

>
> In the best case this was simply useless, because we'll need to split 
> them anyway later on when A or B is destroyed, but in the worst case 
> we could fault the GPU, e.g. if merging leads to a change of the page 
> tables that are backing binding 1, but buffer A is already in use by 
> userspace.

WOW wait a second, regions absolutely don't help you with that anyway.

You need to keep track which mappings are used or otherwise any 
modification could lead to problems.

In other words when the GPU already uses A you *must* have a fence on 
the page tables backing A to prevent their destruction.

>
> In Nouveau, I think we could also get rid of regions and do something 
> driver specific for the handling of the dual page tables, which I want 
> to use for sparse regions *and* just don't merge (at least for now). 
> But exactly for the sake of not limiting drivers in their HW specifics 
> I thought it'd be great if merging is supported in case it makes sense 
> for a specific HW, especially given the fact that memory sizes are 
> increasing.

What do you mean with that?

If you want your page tables to be modifiable while the GPU is using 
them (which is basically a standard requirement from sparse bindings in 
Vulkan) you need double housekeeping anyway.

Those regions strongly sound like you are pushing stuff which should be 
handled in userspace inside the kernel.

Regards,
Christian.

>
>
>>
>> Regards,
>> Christian.
>>
>>
>>>
>>> Dave.
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-06  9:48                             ` Christian König
@ 2023-02-06 13:27                               ` Danilo Krummrich
  2023-02-06 16:14                                 ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-06 13:27 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

On 2/6/23 10:48, Christian König wrote:
> Am 02.02.23 um 19:31 schrieb Danilo Krummrich:
>> On 2/2/23 12:53, Christian König wrote:
>>> Am 01.02.23 um 09:10 schrieb Dave Airlie:
>>>> [SNIP]
>>>>>> For drivers that don't intend to merge at all and (somehow) are
>>>>>> capable of dealing with sparse regions without knowing the sparse
>>>>>> region's boundaries, it'd be easy to make those gpuva_regions 
>>>>>> optional.
>>>>> Yeah, but this then defeats the approach of having the same hw
>>>>> independent interface/implementation for all drivers.
>>>> I think you are running a few steps ahead here. The plan isn't to have
>>>> an independent interface, it's to provide a set of routines and
>>>> tracking that will be consistent across drivers, so that all drivers
>>>> once using them will operate in mostly the same fashion with respect
>>>> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
>>>> amdgpu and freedreno which I think end up operating slightly different
>>>> around lifetimes. I'd like to save future driver writers the effort of
>>>> dealing with those decisions and this should drive their user api
>>>> design so to enable vulkan sparse bindings.
>>>
>>> Ok in this case I'm pretty sure this is *NOT* a good idea.
>>>
>>> See this means that we define the UAPI implicitly by saying to 
>>> drivers to use a common framework for their VM implementation which 
>>> then results in behavior A,B,C,D....
>>>
>>> If a driver strides away from this common framework because it has 
>>> different requirements based on how his hw work you certainly get 
>>> different behavior again (and you have tons of hw specific 
>>> requirements in here).
>>>
>>> What we should do instead if we want to have some common handling 
>>> among drivers (which I totally agree on makes sense) then we should 
>>> define the UAPI explicitly.
>>
>> By asking that I don't want to say I'm against this idea, I'm just 
>> wondering how it becomes easier to deal with "tons of hw specific 
>> requirements" by generalizing things even more?
> 
> I'm already maintaining two different GPU VM solutions in the GPU 
> drivers in the kernel, radeon and amdgpu. The hw they driver is 
> identical, just the UAPI is different. And only because of the different 
> UAPI they can't have the same VM backend implementation.
> 
> The hw stuff is completely abstract able. That's just stuff you need to 
> consider when defining the structures you pass around.

Wouldn't we need to have strict limitations on that, such that HW 
specific structures / fields are not allowed to break the semantics of 
the UAPI? Because otherwise we wouldn't be able to attach generalized 
components to the unified UAPI which ultimately would be the whole 
purpose. So, if this consideration is correct, I'd still see a risk of 
drivers striding away from it because of their requirements. Again, I 
think a unified UAPI is a good idea, but it sounds more difficult to me 
than this last paragraph implies.

> 
> But a messed up UAPI is sometimes impossible to fix because of backward 
> compatibility.
> 
> We learned that the hard way with radeon and mostly fixed it by coming 
> up with a completely new implementation for amdgpu.
> 
>> What makes us think that we do a better job in considering all hw 
>> specific requirements with a unified UAPI than with a more lightweight 
>> generic component for tracking VA mappings?
> 
> Because this defines the UAPI implicitly and that's seldom a good idea.
> 
> As I said before tracking is the easy part of the job. Defining this 
> generic component helps a little bit writing new drivers, but it leaves 
> way to much room for speculations on the UAPI.
> 

Trying to move forward, I agree that a unified UAPI would improve the 
situation regarding the problems you mentioned and the examples you have 
given.

However, not having the GPUVA manager wouldn't give us a unified UAPI 
either. And as long as it delivers a generic component to solve a 
problem while not making the overall situation worse or preventing us 
from reaching this desirable goal of having a unified UAPI I tend to 
think it's fine to have such a component.

>> Also, wouldn't we need something like the GPUVA manager as part of a 
>> unified UAPI?
> 
> Not necessarily. We can write components to help drivers implement the 
> UAPI, but this isn't mandatory.

Well, yes, not necessarily. However, as mentioned above, wouldn't it be 
a major goal of a unified UAPI to be able to attach generic components 
to it?

> 
>>
>>>
>>> For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
>>> independent as well as driver dependent information and then has the 
>>> documented behavior:
>>> a) VAs do (or don't) vanish automatically when the GEM handle is closed.
>>> b) GEM BOs do (or don't) get an additional reference for each VM they 
>>> are used in.
>>> c) Can handle some common use cases driver independent (BO mappings, 
>>> readonly, writeonly, sparse etc...).
>>> d) Has a well defined behavior when the operation is executed async. 
>>> E.g. in/out fences.
>>> e) Can still handle hw specific stuff like (for example) trap on 
>>> access etc....
>>> ...
>>>
>>> Especially d is what Bas and I have pretty much already created a 
>>> prototype for the amdgpu specific IOCTL for, but essentially this is 
>>> completely driver independent and actually the more complex stuff. 
>>> Compared to that common lifetime of BOs is just nice to have.
>>>
>>> I strongly think we should concentrate on getting this right as well.
>>>
>>>> Now if merging is a feature that makes sense to one driver maybe it
>>>> makes sense to all, however there may be reasons amdgpu gets away
>>>> without merging that other drivers might not benefit from, there might
>>>> also be a benefit to amdgpu from merging that you haven't looked at
>>>> yet, so I think we could leave merging as an optional extra driver
>>>> knob here. The userspace API should operate the same, it would just be
>>>> the gpu pagetables that would end up different sizes.
>>>
>>> Yeah, agree completely. The point is that we should not have 
>>> complexity inside the kernel which is not necessarily needed in the 
>>> kernel.
>>>
>>> So merging or not is something we have gone back and forth for 
>>> amdgpu, one the one hand it reduces the memory footprint of the 
>>> housekeeping overhead on the other hand it makes the handling more 
>>> complex, error prone and use a few more CPU cycles.
>>>
>>> For amdgpu merging is mostly beneficial when you can get rid of a 
>>> whole page tables layer in the hierarchy, but for this you need to 
>>> merge at least 2MiB or 1GiB together. And since that case doesn't 
>>> happen that often we stopped doing it.
>>>
>>> But for my understanding why you need the ranges for the merging? 
>>> Isn't it sufficient to check that the mappings have the same type, 
>>> flags, BO, whatever backing them?
>>
>> Not entirely. Let's assume userspace creates two virtually contiguous 
>> buffers (VKBuffer) A and B. Userspace could bind a BO with BO offset 0 
>> to A (binding 1) and afterwards bind the same BO with BO offset 
>> length(A) to B (binding 2), maybe unlikely but AFAIK not illegal.
>>
>> If we don't know about the bounds of A and B in the kernel, we detect 
>> that both bindings are virtually and physically contiguous and we 
>> merge them.
> 
> Well as far as I can see this is actually legal and desirable.

Legal, not sure, may depend on the semantics of the UAPI. (More on that 
below your next paragraph.)

Desirable, I don't think so. Since those mappings are associated with 
different VKBuffers they get split up later on anyway, hence why bother 
merging?

> 
>>
>> In the best case this was simply useless, because we'll need to split 
>> them anyway later on when A or B is destroyed, but in the worst case 
>> we could fault the GPU, e.g. if merging leads to a change of the page 
>> tables that are backing binding 1, but buffer A is already in use by 
>> userspace.
> 
> WOW wait a second, regions absolutely don't help you with that anyway.
> 
> You need to keep track which mappings are used or otherwise any 
> modification could lead to problems.
> 
> In other words when the GPU already uses A you *must* have a fence on 
> the page tables backing A to prevent their destruction.
> 

As mentioned above, I'm not entirely sure about that and it might just 
depend on the semantics of the UAPI.

My understanding is that userspace is fully responsible on the parts of 
the GPU VA space it owns. This means that userspace needs to take care 
to *not* ask the kernel to modify mappings that are in use currently. 
Hence, the kernel is in charge to not modify mappings it set up on 
behalf of userspace unless userspace explicitly asks the kernel to do so.

If those are valid preconditions, and based on them we want to support 
merging, the kernel must know about the VA space allocations (or 
VKBuffers in userspace terminology) to make sure it never merges across 
their boundaries, which might not make much sense anyway.

>>
>> In Nouveau, I think we could also get rid of regions and do something 
>> driver specific for the handling of the dual page tables, which I want 
>> to use for sparse regions *and* just don't merge (at least for now). 
>> But exactly for the sake of not limiting drivers in their HW specifics 
>> I thought it'd be great if merging is supported in case it makes sense 
>> for a specific HW, especially given the fact that memory sizes are 
>> increasing.
> 
> What do you mean with that?
> 
> If you want your page tables to be modifiable while the GPU is using 
> them (which is basically a standard requirement from sparse bindings in 
> Vulkan) you need double housekeeping anyway.
> 
> Those regions strongly sound like you are pushing stuff which should be 
> handled in userspace inside the kernel.

1. userspace allocates a new VKBuffer with the sparse bit set (0x0 - 
0x800000)

2. kernel creates a new region structure with the range 0x800000 and 
creates a new PT (A) with 4 PTEs with the sparse flag set (page shift is 21)

3. userspace requests a memory backed mapping at 0x200000 with size 0x2000

4. kernel creates a new mapping structure with base address 0x200000 and 
range 0x2000 and creates a new PT (B) with 2 PTEs (page shift is 12) 
"overlaying" PT A

5. userspace crashes unexpectedly for some reason

6. kernel needs to clean things up, iterates the list of mappings and 
unmaps them (PT B is freed); kernel iterates all regions and removes 
them (PT A is freed)

> 
> Regards,
> Christian.
> 
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>>
>>>> Dave.
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-02-03 17:37   ` Matthew Brost
@ 2023-02-06 13:35     ` Christian König
  2023-02-06 13:46       ` Danilo Krummrich
  2023-02-14 11:52     ` Danilo Krummrich
  1 sibling, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-06 13:35 UTC (permalink / raw)
  To: Matthew Brost, Danilo Krummrich
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

Am 03.02.23 um 18:37 schrieb Matthew Brost:
> On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
>> This adds the infrastructure for a manager implementation to keep track
>> of GPU virtual address (VA) mappings.
>>
>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>> start implementing, allow userspace applications to request multiple and
>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>> intended to serve the following purposes in this context.
>>
>> 1) Provide a dedicated range allocator to track GPU VA allocations and
>>     mappings, making use of the drm_mm range allocator.
>>
>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>     particular DRM GEM objects.
>>
>> 3) Provide a common implementation to perform more complex mapping
>>     operations on the GPU VA space. In particular splitting and merging
>>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>     unmap requests.
>>
> Over the past week I've hacked together a PoC port of Xe to GPUVA [1], so
> far it seems really promising. 95% of the way to being feature
> equivalent of the current Xe VM bind implementation and have line of
> sight to getting sparse bindings implemented on top of GPUVA too. IMO
> this has basically everything we need for Xe with a few tweaks.
>
> I am out until 2/14 but wanted to get my thoughts / suggestions out on
> the list before I leave.
>
> 1. GPUVA post didn't support the way Xe does userptrs - a NULL GEM. I
> believe with [2], [3], and [4] GPUVA will support NULL GEMs. Also my
> thinking sparse binds will also have NULL GEMs, more on sparse bindings
> below.
>
> 2. I agree with Christian that drm_mm probably isn't what we want to
> base the GPUVA implementation on, rather a RB tree or Maple tree has
> been discussed. The implementation should be fairly easy to tune once we
> have benchmarks running so not to concerned here as we can figure this
> out down the line.
>
> 3. In Xe we want create xe_vm_op list which inherits from drm_gpuva_op
> I've done this with a hack [5], I believe when we rebase we can do this
> with a custom callback to allocate a large op size.
>
> 4. I'd like add user bits to drm_gpuva_flags like I do in [6]. This is
> similar to DMA_FENCE_FLAG_USER_BITS.
>
> 5. In Xe we have VM prefetch operation which is needed for our compute
> UMD with page faults. I'd like add prefetch type of operation like we do
> in [7].
>
> 6. In Xe we have VM unbind all mappings for a GEM IOCTL, I'd like to add
> support to generate this operation list to GPUVA like we do in [8].
>
> 7. I've thought about how Xe will implement sparse mappings (read 0,
> writes dropped). My current thinking is a sparse mapping will be
> represented as a drm_gpuva rather than region like in Nouveau. Making
> regions optional to me seems likes good idea rather than forcing the
> user of GPUVA code to create 1 large region for the manager as I
> currently do in the Xe PoC.

 From Danilo's explanation I'm now pretty sure that regions won't work 
for Nouveau either.

He seems to use an incorrect assumption about applications not changing 
the sparse mappings behind a VkBuffer while the GPU is using it.

As far as I can see games like Forza won't work with this approach.

>
> 8. Personally I'd like the caller to own the locking for GEM drm_gpuva
> list (drm_gpuva_link_*, drm_gpuva_unlink_* functions). In Xe we almost
> certainly will have the GEM dma-resv lock when we touch this list so an
> extra lock here is redundant. Also it kinda goofy that caller owns the
> for drm_gpuva insertion / removal but not the locking for this list.
>
> WRT to Christian thoughts on a common uAPI rules for VM binds, I kinda
> like that idea but I don't think that is necessary. All of pur uAPI
> should be close but also the GPUVA implementation should be flexible
> enough to fit all of our needs and I think for the most part it is.

Maybe I should refine my concerns: A common component for GPUVM mappings 
is a good idea, but we should not expect that to define any driver 
independent UAPI.

If we want to define driver independent UAPI we should do so explicitly.

Christian.

>
> Let me know what everything thinks about this. It would be great if when
> I'm back on 2/14 I can rebase the Xe port to GPUVA on another version of
> the GPUVA code and get sparse binding support implementation. Also I'd
> like to get GPUVA merged in the Xe repo ASAP as our VM bind code badly
> needed to be cleaned and this was the push we needed to make this
> happen.
>
> Matt
>
> [1] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314
> [2] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=2ae21d7a3f52e5eb2c105ed8ae231471274bdc36
> [3] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=49fca9f5d96201f5cbd1b19c7ff17eedfac65cdc
> [4] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=61fa6b1e1f10e791ae82358fa971b04421d53024
> [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=87fc08dcf0840e794b38269fe4c6a95d088d79ec
> [6] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=a4826c22f6788bc29906ffa263c1cd3c4661fa77
> [7] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=f008bbb55b213868e52c7b9cda4c1bfb95af6aee
> [8] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=41f4f71c05d04d2b17d988dd95369b5df2d7f681
>
>> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/drm-mm.rst    |   31 +
>>   drivers/gpu/drm/Makefile        |    1 +
>>   drivers/gpu/drm/drm_gem.c       |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
>>   include/drm/drm_drv.h           |    6 +
>>   include/drm/drm_gem.h           |   75 ++
>>   include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
>>   7 files changed, 1966 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index a52e6f4117d6..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>      :export:
>>   
>> +DRM GPU VA Manager
>> +==================
>> +
>> +Overview
>> +--------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Overview
>> +
>> +Split and Merge
>> +---------------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Split and Merge
>> +
>> +Locking
>> +-------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Locking
>> +
>> +
>> +DRM GPU VA Manager Function References
>> +--------------------------------------
>> +
>> +.. kernel-doc:: include/drm/drm_gpuva_mgr.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :export:
>> +
>>   DRM Buddy Allocator
>>   ===================
>>   
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 4fe190aee584..de2ffca3b6e4 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -45,6 +45,7 @@ drm-y := \
>>   	drm_vblank.o \
>>   	drm_vblank_work.o \
>>   	drm_vma_manager.o \
>> +	drm_gpuva_mgr.o \
>>   	drm_writeback.o
>>   drm-$(CONFIG_DRM_LEGACY) += \
>>   	drm_agpsupport.o \
>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>> index 59a0bb5ebd85..65115fe88627 100644
>> --- a/drivers/gpu/drm/drm_gem.c
>> +++ b/drivers/gpu/drm/drm_gem.c
>> @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>>   	if (!obj->resv)
>>   		obj->resv = &obj->_resv;
>>   
>> +	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
>> +		drm_gem_gpuva_init(obj);
>> +
>>   	drm_vma_node_reset(&obj->vma_node);
>>   	INIT_LIST_HEAD(&obj->lru_node);
>>   }
>> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
>> new file mode 100644
>> index 000000000000..e665f642689d
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
>> @@ -0,0 +1,1323 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2022 Red Hat.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *     Danilo Krummrich <dakr@redhat.com>
>> + *
>> + */
>> +
>> +#include <drm/drm_gem.h>
>> +#include <drm/drm_gpuva_mgr.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
>> + * of a GPU's virtual address (VA) space and manages the corresponding virtual
>> + * mappings represented by &drm_gpuva objects. It also keeps track of the
>> + * mapping's backing &drm_gem_object buffers.
>> + *
>> + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
>> + * &drm_gpuva objects representing all existent GPU VA mappings using this
>> + * &drm_gem_object as backing buffer.
>> + *
>> + * A GPU VA mapping can only be created within a previously allocated
>> + * &drm_gpuva_region, which represents a reserved portion of the GPU VA space.
>> + * GPU VA mappings are not allowed to span over a &drm_gpuva_region's boundary.
>> + *
>> + * GPU VA regions can also be flagged as sparse, which allows drivers to create
>> + * sparse mappings for a whole GPU VA region in order to support Vulkan
>> + * 'Sparse Resources'.
>> + *
>> + * The GPU VA manager internally uses the &drm_mm range allocator to manage the
>> + * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
>> + * space.
>> + *
>> + * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
>> + * the &drm_gpuva_manager contains a special region representing the portion of
>> + * VA space reserved by the kernel. This node is initialized together with the
>> + * GPU VA manager instance and removed when the GPU VA manager is destroyed.
>> + *
>> + * In a typical application drivers would embed struct drm_gpuva_manager,
>> + * struct drm_gpuva_region and struct drm_gpuva within their own driver
>> + * specific structures, there won't be any memory allocations of it's own nor
>> + * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
>> + */
>> +
>> +/**
>> + * DOC: Split and Merge
>> + *
>> + * The DRM GPU VA manager also provides an algorithm implementing splitting and
>> + * merging of existent GPU VA mappings with the ones that are requested to be
>> + * mapped or unmapped. This feature is required by the Vulkan API to implement
>> + * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
>> + * VM BIND.
>> + *
>> + * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list of map, unmap
>> + * and remap operations for a given newly requested mapping. This list
>> + * represents the set of operations to execute in order to integrate the new
>> + * mapping cleanly into the current state of the GPU VA space.
>> + *
>> + * Depending on how the new GPU VA mapping intersects with the existent mappings
>> + * of the GPU VA space the &drm_gpuva_ops contain an arbitrary amount of unmap
>> + * operations, a maximum of two remap operations and a single map operation.
>> + * The set of operations can also be empty if no operation is required, e.g. if
>> + * the requested mapping already exists in the exact same way.
>> + *
>> + * The single map operation, if existent, represents the original map operation
>> + * requested by the caller. Please note that this operation might be altered
>> + * comparing it with the original map operation, e.g. because it was merged with
>> + * an already  existent mapping. Hence, drivers must execute this map operation
>> + * instead of the original one they passed to drm_gpuva_sm_map_ops_create().
>> + *
>> + * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
>> + * &drm_gpuva to unmap is physically contiguous with the original mapping
>> + * request. Optionally, if 'keep' is set, drivers may keep the actual page table
>> + * entries for this &drm_gpuva, adding the missing page table entries only and
>> + * update the &drm_gpuva_manager's view of things accordingly.
>> + *
>> + * Drivers may do the same optimization, namely delta page table updates, also
>> + * for remap operations. This is possible since &drm_gpuva_op_remap consists of
>> + * one unmap operation and one or two map operations, such that drivers can
>> + * derive the page table update delta accordingly.
>> + *
>> + * Note that there can't be more than two existent mappings to split up, one at
>> + * the beginning and one at the end of the new mapping, hence there is a
>> + * maximum of two remap operations.
>> + *
>> + * Generally, the DRM GPU VA manager never merges mappings across the
>> + * boundaries of &drm_gpuva_regions. This is the case since merging between
>> + * GPU VA regions would result into unmap and map operations to be issued for
>> + * both regions involved although the original mapping request was referred to
>> + * one specific GPU VA region only. Since the other GPU VA region, the one not
>> + * explicitly requested to be altered, might be in use by the GPU, we are not
>> + * allowed to issue any map/unmap operations for this region.
>> + *
>> + * Note that before calling drm_gpuva_sm_map_ops_create() again with another
>> + * mapping request it is necessary to update the &drm_gpuva_manager's view of
>> + * the GPU VA space. The previously obtained operations must be either fully
>> + * processed or completely abandoned.
>> + *
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
>> + * provides drivers a the list of operations to be executed in order to unmap
>> + * a range of GPU VA space. The logic behind this functions is way simpler
>> + * though: For all existent mappings enclosed by the given range unmap
>> + * operations are created. For mappings which are only partically located within
>> + * the given range, remap operations are created such that those mappings are
>> + * split up and re-mapped partically.
>> + *
>> + * The following paragraph depicts the basic constellations of existent GPU VA
>> + * mappings, a newly requested mapping and the resulting mappings as implemented
>> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
>> + * of those constellations.
>> + *
>> + * ::
>> + *
>> + *	1) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	2) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=m)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=m)
>> + *
>> + *
>> + *	3) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	4) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0  a  1
>> + *	old: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or bo_offset.
>> + *
>> + *
>> + *	5) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  b  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0  b  1  a' 2
>> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	6) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  a  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	7) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----| (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2
>> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	8) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	      0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----| (bo_offset=n+1)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	9) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     b     3
>> + *	req:       |-----------| (bo_offset=m)
>> + *
>> + *	     0  a  1     b     3
>> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	10) Existent mapping is merged.
>> + *	-------------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     a     3
>> + *	req:       |-----------| (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	new: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	11) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----|       (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2  a' 3
>> + *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>> + *
>> + *
>> + *	12) Existent mapping is kept.
>> + *	-----------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----|       (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	13) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	14) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	req: |----------------| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	new: |----------------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	15) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	           1     a     3
>> + *	old:       |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     2
>> + *	req: |-----------|       (bo_offset=m)
>> + *
>> + *	     0     b     2  a' 3
>> + *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>> + *
>> + *
>> + *	16) Existent mappings are merged.
>> + *	---------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------|                        (bo_offset=n)
>> + *
>> + *	                            2     a     3
>> + *	old':                       |-----------| (bo_offset=n+2)
>> + *
>> + *	                1     a     2
>> + *	req:            |-----------|             (bo_offset=n+1)
>> + *
>> + *	                      a
>> + *	new: |----------------------------------| (bo_offset=n)
>> + */
>> +
>> +/**
>> + * DOC: Locking
>> + *
>> + * Generally, the GPU VA manager does not take care of locking itself, it is
>> + * the drivers responsibility to take care about locking. Drivers might want to
>> + * protect the following operations: inserting, destroying and iterating
>> + * &drm_gpuva and &drm_gpuva_region objects as well as generating split and merge
>> + * operations.
>> + *
>> + * The GPU VA manager does take care of the locking of the backing
>> + * &drm_gem_object buffers GPU VA lists though, unless the provided functions
>> + * documentation claims otherwise.
>> + */
>> +
>> +/**
>> + * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
>> + * @mgr: pointer to the &drm_gpuva_manager to initialize
>> + * @name: the name of the GPU VA space
>> + * @start_offset: the start offset of the GPU VA space
>> + * @range: the size of the GPU VA space
>> + * @reserve_offset: the start of the kernel reserved GPU VA area
>> + * @reserve_range: the size of the kernel reserved GPU VA area
>> + *
>> + * The &drm_gpuva_manager must be initialized with this function before use.
>> + *
>> + * Note that @mgr must be cleared to 0 before calling this function. The given
>> + * &name is expected to be managed by the surrounding driver structures.
>> + */
>> +void
>> +drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>> +		       const char *name,
>> +		       u64 start_offset, u64 range,
>> +		       u64 reserve_offset, u64 reserve_range)
>> +{
>> +	drm_mm_init(&mgr->va_mm, start_offset, range);
>> +	drm_mm_init(&mgr->region_mm, start_offset, range);
>> +
>> +	mgr->mm_start = start_offset;
>> +	mgr->mm_range = range;
>> +
>> +	mgr->name = name ? name : "unknown";
>> +
>> +	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
>> +	mgr->kernel_alloc_node.start = reserve_offset;
>> +	mgr->kernel_alloc_node.size = reserve_range;
>> +	drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_manager_init);
>> +
>> +/**
>> + * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
>> + * @mgr: pointer to the &drm_gpuva_manager to clean up
>> + *
>> + * Note that it is a bug to call this function on a manager that still
>> + * holds GPU VA mappings.
>> + */
>> +void
>> +drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>> +{
>> +	mgr->name = NULL;
>> +	drm_mm_remove_node(&mgr->kernel_alloc_node);
>> +	drm_mm_takedown(&mgr->va_mm);
>> +	drm_mm_takedown(&mgr->region_mm);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>> +
>> +static struct drm_gpuva_region *
>> +drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +
>> +	/* Find the VA region the requested range is strictly enclosed by. */
>> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
>> +		if (reg->node.start <= addr &&
>> +		    reg->node.start + reg->node.size >= addr + range &&
>> +		    &reg->node != &mgr->kernel_alloc_node)
>> +			return reg;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static bool
>> +drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>> +{
>> +	return !!drm_gpuva_in_region(mgr, addr, range);
>> +}
>> +
>> +/**
>> + * drm_gpuva_insert - insert a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @va: the &drm_gpuva to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>> +		 struct drm_gpuva *va,
>> +		 u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +	int ret;
>> +
>> +	if (!va->gem.obj)
>> +		return -EINVAL;
>> +
>> +	reg = drm_gpuva_in_region(mgr, addr, range);
>> +	if (!reg)
>> +		return -EINVAL;
>> +
>> +	ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
>> +					  range, 0,
>> +					  0, addr,
>> +					  addr + range,
>> +					  DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	va->mgr = mgr;
>> +	va->region = reg;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_insert);
>> +
>> +/**
>> + * drm_gpuva_link_locked - link a &drm_gpuva
>> + * @va: the &drm_gpuva to link
>> + *
>> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_link_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +	list_add_tail(&va->head, &va->gem.obj->gpuva.list);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_link_locked);
>> +
>> +/**
>> + * drm_gpuva_link_unlocked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_link_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	drm_gpuva_link_locked(va);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_link_unlocked);
>> +
>> +/**
>> + * drm_gpuva_unlink_locked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_unlink_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +	list_del_init(&va->head);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_unlink_locked);
>> +
>> +/**
>> + * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	drm_gpuva_unlink_locked(va);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
>> +
>> +/**
>> + * drm_gpuva_destroy_locked - destroy a &drm_gpuva
>> + * @va: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
>> + * associated with and removes it from the underlaying range allocator.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_destroy_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +
>> +	list_del(&va->head);
>> +	drm_mm_remove_node(&va->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_destroy_locked);
>> +
>> +/**
>> + * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
>> + * @va: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
>> + * associated with and removes it from the underlaying range allocator.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	list_del(&va->head);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +
>> +	drm_mm_remove_node(&va->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
>> +
>> +/**
>> + * drm_gpuva_find - find a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuvas address
>> + * @range: the &drm_gpuvas range
>> + *
>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>> +	       u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva *va;
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
>> +		if (va->node.start == addr &&
>> +		    va->node.size == range)
>> +			return va;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find);
>> +
>> +/**
>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @start: the given GPU VA's start address
>> + *
>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>> +{
>> +	struct drm_mm_node *node;
>> +
>> +	if (start <= mgr->mm_start ||
>> +	    start > (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
>> +	if (node == &mgr->va_mm.head_node)
>> +		return NULL;
>> +
>> +	return (struct drm_gpuva *)node;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_prev);
>> +
>> +/**
>> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @end: the given GPU VA's end address
>> + *
>> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
>> +{
>> +	struct drm_mm_node *node;
>> +
>> +	if (end < mgr->mm_start ||
>> +	    end >= (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
>> +	if (node == &mgr->va_mm.head_node)
>> +		return NULL;
>> +
>> +	return (struct drm_gpuva *)node;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_next);
>> +
>> +/**
>> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @reg: the &drm_gpuva_region to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva_region with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			struct drm_gpuva_region *reg,
>> +			u64 addr, u64 range)
>> +{
>> +	int ret;
>> +
>> +	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
>> +					  range, 0,
>> +					  0, addr,
>> +					  addr + range,
>> +					  DRM_MM_INSERT_LOW|
>> +					  DRM_MM_INSERT_ONCE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	reg->mgr = mgr;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_insert);
>> +
>> +/**
>> + * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager holding the region
>> + * @reg: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &reg from the underlaying range allocator.
>> + */
>> +void
>> +drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>> +			 struct drm_gpuva_region *reg)
>> +{
>> +	struct drm_gpuva *va;
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr,
>> +				       reg->node.start,
>> +				       reg->node.size) {
>> +		WARN(1, "GPU VA region must be empty on destroy.\n");
>> +		return;
>> +	}
>> +
>> +	if (&reg->node == &mgr->kernel_alloc_node) {
>> +		WARN(1, "Can't destroy kernel reserved region.\n");
>> +		return;
>> +	}
>> +
>> +	drm_mm_remove_node(&reg->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_destroy);
>> +
>> +/**
>> + * drm_gpuva_region_find - find a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuva_regions address
>> + * @range: the &drm_gpuva_regions range
>> + *
>> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
>> + */
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>> +		      u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +
>> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
>> +		if (reg->node.start == addr &&
>> +		    reg->node.size == range)
>> +			return reg;
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_find);
>> +
>> +static int
>> +gpuva_op_map_new(struct drm_gpuva_op **pop,
>> +		 u64 addr, u64 range,
>> +		 struct drm_gem_object *obj, u64 offset)
>> +{
>> +	struct drm_gpuva_op *op;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_MAP;
>> +	op->map.va.addr = addr;
>> +	op->map.va.range = range;
>> +	op->map.gem.obj = obj;
>> +	op->map.gem.offset = offset;
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +gpuva_op_remap_new(struct drm_gpuva_op **pop,
>> +		   struct drm_gpuva_op_map *prev,
>> +		   struct drm_gpuva_op_map *next,
>> +		   struct drm_gpuva_op_unmap *unmap)
>> +{
>> +	struct drm_gpuva_op *op;
>> +	struct drm_gpuva_op_remap *r;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_REMAP;
>> +	r = &op->remap;
>> +
>> +	if (prev) {
>> +		r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
>> +		if (!r->prev)
>> +			goto err_free_op;
>> +	}
>> +
>> +	if (next) {
>> +		r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
>> +		if (!r->next)
>> +			goto err_free_prev;
>> +	}
>> +
>> +	r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
>> +	if (!r->unmap)
>> +		goto err_free_next;
>> +
>> +	return 0;
>> +
>> +err_free_next:
>> +	if (next)
>> +		kfree(r->next);
>> +err_free_prev:
>> +	if (prev)
>> +		kfree(r->prev);
>> +err_free_op:
>> +	kfree(op);
>> +	*pop = NULL;
>> +
>> +	return -ENOMEM;
>> +}
>> +
>> +static int
>> +gpuva_op_unmap_new(struct drm_gpuva_op **pop,
>> +		   struct drm_gpuva *va, bool merge)
>> +{
>> +	struct drm_gpuva_op *op;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_UNMAP;
>> +	op->unmap.va = va;
>> +	op->unmap.keep = merge;
>> +
>> +	return 0;
>> +}
>> +
>> +#define op_map_new_to_list(_ops, _addr, _range,		\
>> +			   _obj, _offset)		\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_map_new(&op, _addr, _range,	\
>> +			       _obj, _offset);		\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +#define op_remap_new_to_list(_ops, _prev, _next,	\
>> +			     _unmap)			\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_remap_new(&op, _prev, _next,	\
>> +				 _unmap);		\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +#define op_unmap_new_to_list(_ops, _gpuva, _merge)	\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_unmap_new(&op, _gpuva, _merge);	\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +/**
>> + * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>> + * @req_addr: the start address of the new mapping
>> + * @req_range: the range of the new mapping
>> + * @req_obj: the &drm_gem_object to map
>> + * @req_offset: the offset within the &drm_gem_object
>> + *
>> + * This function creates a list of operations to perform splitting and merging
>> + * of existent mapping(s) with the newly requested one.
>> + *
>> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>> + * in the given order. It can contain map, unmap and remap operations, but it
>> + * also can be empty if no operation is required, e.g. if the requested mapping
>> + * already exists is the exact same way.
>> + *
>> + * There can be an arbitrary amount of unmap operations, a maximum of two remap
>> + * operations and a single map operation. The latter one, if existent,
>> + * represents the original map operation requested by the caller. Please note
>> + * that the map operation might has been modified, e.g. if it was
>> + * merged with an existent mapping.
>> + *
>> + * Note that before calling this function again with another mapping request it
>> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>> + * The previously obtained operations must be either processed or abandoned.
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * After the caller finished processing the returned &drm_gpuva_ops, they must
>> + * be freed with &drm_gpuva_ops_free.
>> + *
>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>> + */
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>> +			    u64 req_addr, u64 req_range,
>> +			    struct drm_gem_object *req_obj, u64 req_offset)
>> +{
>> +	struct drm_gpuva_ops *ops;
>> +	struct drm_gpuva *va, *prev = NULL;
>> +	u64 req_end = req_addr + req_range;
>> +	bool skip_pmerge = false, skip_nmerge = false;
>> +	int ret;
>> +
>> +	if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>> +	if (!ops)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	INIT_LIST_HEAD(&ops->list);
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +
>> +		/* Generally, we want to skip merging with potential mappings
>> +		 * left and right of the requested one when we found a
>> +		 * collision, since merging happens in this loop already.
>> +		 *
>> +		 * However, there is one exception when the requested mapping
>> +		 * spans into a free VM area. If this is the case we might
>> +		 * still hit the boundary of another mapping before and/or
>> +		 * after the free VM area.
>> +		 */
>> +		skip_pmerge = true;
>> +		skip_nmerge = true;
>> +
>> +		if (addr == req_addr) {
>> +			bool merge = obj == req_obj &&
>> +				     offset == req_offset;
>> +			if (end == req_end) {
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_unmap_new_to_list(&ops->list, va, false);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				skip_nmerge = false;
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = range - req_range,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + req_range,
>> +				};
>> +				struct drm_gpuva_op_unmap u = { .va = va };
>> +
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>> +				break;
>> +			}
>> +		} else if (addr < req_addr) {
>> +			u64 ls_range = req_addr - addr;
>> +			struct drm_gpuva_op_map p = {
>> +				.va.addr = addr,
>> +				.va.range = ls_range,
>> +				.gem.obj = obj,
>> +				.gem.offset = offset,
>> +			};
>> +			struct drm_gpuva_op_unmap u = { .va = va };
>> +			bool merge = obj == req_obj &&
>> +				     offset + ls_range == req_offset;
>> +
>> +			if (end == req_end) {
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				u64 new_addr = addr;
>> +				u64 new_range = req_range + ls_range;
>> +				u64 new_offset = offset;
>> +
>> +				/* We validated that the requested mapping is
>> +				 * within a single VA region already.
>> +				 * Since it overlaps the current mapping (which
>> +				 * can't cross a VA region boundary) we can be
>> +				 * sure that we're still within the boundaries
>> +				 * of the same VA region after merging.
>> +				 */
>> +				if (merge) {
>> +					req_offset = new_offset;
>> +					req_addr = new_addr;
>> +					req_range = new_range;
>> +					op_unmap_new_to_list(&ops->list, va, true);
>> +					goto next;
>> +				}
>> +
>> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = end - req_end,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + ls_range +
>> +						      req_range,
>> +				};
>> +
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, &p, &n, &u);
>> +				break;
>> +			}
>> +		} else if (addr > req_addr) {
>> +			bool merge = obj == req_obj &&
>> +				     offset == req_offset +
>> +					       (addr - req_addr);
>> +			if (!prev)
>> +				skip_pmerge = false;
>> +
>> +			if (end == req_end) {
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				skip_nmerge = false;
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = end - req_end,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + req_end - addr,
>> +				};
>> +				struct drm_gpuva_op_unmap u = { .va = va };
>> +				u64 new_end = end;
>> +				u64 new_range = new_end - req_addr;
>> +
>> +				/* We validated that the requested mapping is
>> +				 * within a single VA region already.
>> +				 * Since it overlaps the current mapping (which
>> +				 * can't cross a VA region boundary) we can be
>> +				 * sure that we're still within the boundaries
>> +				 * of the same VA region after merging.
>> +				 */
>> +				if (merge) {
>> +					req_end = new_end;
>> +					req_range = new_range;
>> +					op_unmap_new_to_list(&ops->list, va, true);
>> +					break;
>> +				}
>> +
>> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>> +				break;
>> +			}
>> +		}
>> +next:
>> +		prev = va;
>> +	}
>> +
>> +	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
>> +	if (va) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 new_offset = offset;
>> +		u64 new_addr = addr;
>> +		u64 new_range = req_range + range;
>> +		bool merge = obj == req_obj &&
>> +			     offset + range == req_offset;
>> +
>> +		/* Don't merge over VA region boundaries. */
>> +		merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
>> +		if (merge) {
>> +			op_unmap_new_to_list(&ops->list, va, true);
>> +
>> +			req_offset = new_offset;
>> +			req_addr = new_addr;
>> +			req_range = new_range;
>> +		}
>> +	}
>> +
>> +	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
>> +	if (va) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +		u64 new_range = req_range + range;
>> +		u64 new_end = end;
>> +		bool merge = obj == req_obj &&
>> +			     offset == req_offset + req_range;
>> +
>> +		/* Don't merge over VA region boundaries. */
>> +		merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
>> +		if (merge) {
>> +			op_unmap_new_to_list(&ops->list, va, true);
>> +
>> +			req_range = new_range;
>> +			req_end = new_end;
>> +		}
>> +	}
>> +
>> +	op_map_new_to_list(&ops->list,
>> +			   req_addr, req_range,
>> +			   req_obj, req_offset);
>> +
>> +done:
>> +	return ops;
>> +
>> +err_free_ops:
>> +	drm_gpuva_ops_free(ops);
>> +	return ERR_PTR(ret);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
>> +
>> +#undef op_map_new_to_list
>> +#undef op_remap_new_to_list
>> +#undef op_unmap_new_to_list
>> +
>> +/**
>> + * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>> + * @req_addr: the start address of the range to unmap
>> + * @req_range: the range of the mappings to unmap
>> + *
>> + * This function creates a list of operations to perform unmapping and, if
>> + * required, splitting of the mappings overlapping the unmap range.
>> + *
>> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>> + * in the given order. It can contain unmap and remap operations, depending on
>> + * whether there are actual overlapping mappings to split.
>> + *
>> + * There can be an arbitrary amount of unmap operations and a maximum of two
>> + * remap operations.
>> + *
>> + * Note that before calling this function again with another range to unmap it
>> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>> + * The previously obtained operations must be processed or abandoned.
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * After the caller finished processing the returned &drm_gpuva_ops, they must
>> + * be freed with &drm_gpuva_ops_free.
>> + *
>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>> + */
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>> +			      u64 req_addr, u64 req_range)
>> +{
>> +	struct drm_gpuva_ops *ops;
>> +	struct drm_gpuva_op *op;
>> +	struct drm_gpuva_op_remap *r;
>> +	struct drm_gpuva *va;
>> +	u64 req_end = req_addr + req_range;
>> +	int ret;
>> +
>> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>> +	if (!ops)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	INIT_LIST_HEAD(&ops->list);
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +
>> +		op = kzalloc(sizeof(*op), GFP_KERNEL);
>> +		if (!op) {
>> +			ret = -ENOMEM;
>> +			goto err_free_ops;
>> +		}
>> +
>> +		r = &op->remap;
>> +
>> +		if (addr < req_addr) {
>> +			r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
>> +			if (!r->prev) {
>> +				ret = -ENOMEM;
>> +				goto err_free_op;
>> +			}
>> +
>> +			r->prev->va.addr = addr;
>> +			r->prev->va.range = req_addr - addr;
>> +			r->prev->gem.obj = obj;
>> +			r->prev->gem.offset = offset;
>> +		}
>> +
>> +		if (end > req_end) {
>> +			r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
>> +			if (!r->next) {
>> +				ret = -ENOMEM;
>> +				goto err_free_prev;
>> +			}
>> +
>> +			r->next->va.addr = req_end;
>> +			r->next->va.range = end - req_end;
>> +			r->next->gem.obj = obj;
>> +			r->next->gem.offset = offset + (req_end - addr);
>> +		}
>> +
>> +		if (op->remap.prev || op->remap.next) {
>> +			op->op = DRM_GPUVA_OP_REMAP;
>> +			r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
>> +			if (!r->unmap) {
>> +				ret = -ENOMEM;
>> +				goto err_free_next;
>> +			}
>> +
>> +			r->unmap->va = va;
>> +		} else {
>> +			op->op = DRM_GPUVA_OP_UNMAP;
>> +			op->unmap.va = va;
>> +		}
>> +
>> +		list_add_tail(&op->entry, &ops->list);
>> +	}
>> +
>> +	return ops;
>> +
>> +err_free_next:
>> +	if (r->next)
>> +		kfree(r->next);
>> +err_free_prev:
>> +	if (r->prev)
>> +		kfree(r->prev);
>> +err_free_op:
>> +	kfree(op);
>> +err_free_ops:
>> +	drm_gpuva_ops_free(ops);
>> +	return ERR_PTR(ret);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
>> +
>> +/**
>> + * drm_gpuva_ops_free - free the given &drm_gpuva_ops
>> + * @ops: the &drm_gpuva_ops to free
>> + *
>> + * Frees the given &drm_gpuva_ops structure including all the ops associated
>> + * with it.
>> + */
>> +void
>> +drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
>> +{
>> +	struct drm_gpuva_op *op, *next;
>> +
>> +	drm_gpuva_for_each_op_safe(op, next, ops) {
>> +		list_del(&op->entry);
>> +		if (op->op == DRM_GPUVA_OP_REMAP) {
>> +			if (op->remap.prev)
>> +				kfree(op->remap.prev);
>> +
>> +			if (op->remap.next)
>> +				kfree(op->remap.next);
>> +
>> +			kfree(op->remap.unmap);
>> +		}
>> +		kfree(op);
>> +	}
>> +
>> +	kfree(ops);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_ops_free);
>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>> index d7c521e8860f..6feacd93aca6 100644
>> --- a/include/drm/drm_drv.h
>> +++ b/include/drm/drm_drv.h
>> @@ -104,6 +104,12 @@ enum drm_driver_feature {
>>   	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
>>   	 */
>>   	DRIVER_COMPUTE_ACCEL            = BIT(7),
>> +	/**
>> +	 * @DRIVER_GEM_GPUVA:
>> +	 *
>> +	 * Driver supports user defined GPU VA bindings for GEM objects.
>> +	 */
>> +	DRIVER_GEM_GPUVA		= BIT(8),
>>   
>>   	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
>>   
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 772a4adf5287..4a3679034966 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -36,6 +36,8 @@
>>   
>>   #include <linux/kref.h>
>>   #include <linux/dma-resv.h>
>> +#include <linux/list.h>
>> +#include <linux/mutex.h>
>>   
>>   #include <drm/drm_vma_manager.h>
>>   
>> @@ -337,6 +339,17 @@ struct drm_gem_object {
>>   	 */
>>   	struct dma_resv _resv;
>>   
>> +	/**
>> +	 * @gpuva:
>> +	 *
>> +	 * Provides the list and list mutex of GPU VAs attached to this
>> +	 * GEM object.
>> +	 */
>> +	struct {
>> +		struct list_head list;
>> +		struct mutex mutex;
>> +	} gpuva;
>> +
>>   	/**
>>   	 * @funcs:
>>   	 *
>> @@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>>   unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
>>   			       bool (*shrink)(struct drm_gem_object *obj));
>>   
>> +/**
>> + * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
>> + * @obj: the &drm_gem_object
>> + *
>> + * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
>> + * protecting it.
>> + *
>> + * Calling this function is only necessary for drivers intending to support the
>> + * &drm_driver_feature DRIVER_GEM_GPUVA.
>> + */
>> +static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>> +{
>> +	INIT_LIST_HEAD(&obj->gpuva.list);
>> +	mutex_init(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
>> + * @obj: the &drm_gem_object
>> + *
>> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>> + */
>> +static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
>> +{
>> +	mutex_lock(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
>> + * @obj: the &drm_gem_object
>> + *
>> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>> + */
>> +static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
>> +{
>> +	mutex_unlock(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gem_for_each_gpuva(entry, obj) \
>> +	list_for_each_entry(entry, &obj->gpuva.list, head)
>> +
>> +/**
>> + * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva to store the next step
>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
>> + * it is save against removal of elements.
>> + */
>> +#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
>> +	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
>> +
>>   #endif /* __DRM_GEM_H__ */
>> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
>> new file mode 100644
>> index 000000000000..adeb0c916e91
>> --- /dev/null
>> +++ b/include/drm/drm_gpuva_mgr.h
>> @@ -0,0 +1,527 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#ifndef __DRM_GPUVA_MGR_H__
>> +#define __DRM_GPUVA_MGR_H__
>> +
>> +/*
>> + * Copyright (c) 2022 Red Hat.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <drm/drm_mm.h>
>> +#include <linux/mm.h>
>> +#include <linux/rbtree.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/types.h>
>> +
>> +struct drm_gpuva_region;
>> +struct drm_gpuva;
>> +struct drm_gpuva_ops;
>> +
>> +/**
>> + * struct drm_gpuva_manager - DRM GPU VA Manager
>> + *
>> + * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
>> + * the &drm_mm range allocator. Typically, this structure is embedded in bigger
>> + * driver structures.
>> + *
>> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
>> + * pages.
>> + *
>> + * There should be one manager instance per GPU virtual address space.
>> + */
>> +struct drm_gpuva_manager {
>> +	/**
>> +	 * @name: the name of the DRM GPU VA space
>> +	 */
>> +	const char *name;
>> +
>> +	/**
>> +	 * @mm_start: start of the VA space
>> +	 */
>> +	u64 mm_start;
>> +
>> +	/**
>> +	 * @mm_range: length of the VA space
>> +	 */
>> +	u64 mm_range;
>> +
>> +	/**
>> +	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
>> +	 */
>> +	struct drm_mm region_mm;
>> +
>> +	/**
>> +	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
>> +	 */
>> +	struct drm_mm va_mm;
>> +
>> +	/**
>> +	 * @kernel_alloc_node:
>> +	 *
>> +	 * &drm_mm_node representing the address space cutout reserved for
>> +	 * the kernel
>> +	 */
>> +	struct drm_mm_node kernel_alloc_node;
>> +};
>> +
>> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>> +			    const char *name,
>> +			    u64 start_offset, u64 range,
>> +			    u64 reserve_offset, u64 reserve_range);
>> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
>> +
>> +/**
>> + * struct drm_gpuva_region - structure to track a portion of GPU VA space
>> + *
>> + * This structure represents a portion of a GPUs VA space and is associated
>> + * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>> + *
>> + * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
>> + * placed within a &drm_gpuva_region.
>> + */
>> +struct drm_gpuva_region {
>> +	/**
>> +	 * @node: the &drm_mm_node to track the GPU VA region
>> +	 */
>> +	struct drm_mm_node node;
>> +
>> +	/**
>> +	 * @mgr: the &drm_gpuva_manager this object is associated with
>> +	 */
>> +	struct drm_gpuva_manager *mgr;
>> +
>> +	/**
>> +	 * @sparse: indicates whether this region is sparse
>> +	 */
>> +	bool sparse;
>> +};
>> +
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>> +		      u64 addr, u64 range);
>> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			    struct drm_gpuva_region *reg,
>> +			    u64 addr, u64 range);
>> +void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>> +			      struct drm_gpuva_region *reg);
>> +
>> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>> +		     struct drm_gpuva *va,
>> +		     u64 addr, u64 range);
>> +/**
>> + * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
>> + * @node__: &drm_gpuva_region structure to assign to in each iteration step
>> + * @gpuva__: &drm_gpuva_manager structure to walk
>> + * @start__: starting offset, the first node will overlap this
>> + * @end__: ending offset, the last node will start before this (but may overlap)
>> + *
>> + * This iterator walks over all nodes in the range allocator that lie
>> + * between @start and @end. It is implemented similarly to list_for_each(),
>> + * but is using &drm_mm's internal interval tree to accelerate the search for
>> + * the starting node, and hence isn't safe against removal of elements. It
>> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>> + * backing &drm_mm, and may even continue indefinitely.
>> + */
>> +#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
>> +	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
>> +									 (start__), (end__)-1); \
>> +	     node__->node.start < (end__); \
>> +	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
>> +
>> +/**
>> + * drm_gpuva_for_each_region - iternator to walk over a range of nodes
>> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva_region structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gpuva_for_each_region(entry, gpuva) \
>> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>> +
>> +/**
>> + * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
>> + * nodes
>> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva_region to store the next step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva_region structures associated with the
>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>> + * against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
>> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>> +
>> +
>> +/**
>> + * enum drm_gpuva_flags - flags for struct drm_gpuva
>> + */
>> +enum drm_gpuva_flags {
>> +	/**
>> +	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
>> +	 */
>> +	DRM_GPUVA_SWAPPED = (1 << 0),
>> +};
>> +
>> +/**
>> + * struct drm_gpuva - structure to track a GPU VA mapping
>> + *
>> + * This structure represents a GPU VA mapping and is associated with a
>> + * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>> + *
>> + * Typically, this structure is embedded in bigger driver structures.
>> + */
>> +struct drm_gpuva {
>> +	/**
>> +	 * @node: the &drm_mm_node to track the GPU VA mapping
>> +	 */
>> +	struct drm_mm_node node;
>> +
>> +	/**
>> +	 * @mgr: the &drm_gpuva_manager this object is associated with
>> +	 */
>> +	struct drm_gpuva_manager *mgr;
>> +
>> +	/**
>> +	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
>> +	 */
>> +	struct drm_gpuva_region *region;
>> +
>> +	/**
>> +	 * @head: the &list_head to attach this object to a &drm_gem_object
>> +	 */
>> +	struct list_head head;
>> +
>> +	/**
>> +	 * @flags: the &drm_gpuva_flags for this mapping
>> +	 */
>> +	enum drm_gpuva_flags flags;
>> +
>> +	/**
>> +	 * @gem: structure containing the &drm_gem_object and it's offset
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @offset: the offset within the &drm_gem_object
>> +		 */
>> +		u64 offset;
>> +
>> +		/**
>> +		 * @obj: the mapped &drm_gem_object
>> +		 */
>> +		struct drm_gem_object *obj;
>> +	} gem;
>> +};
>> +
>> +void drm_gpuva_link_locked(struct drm_gpuva *va);
>> +void drm_gpuva_link_unlocked(struct drm_gpuva *va);
>> +void drm_gpuva_unlink_locked(struct drm_gpuva *va);
>> +void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
>> +
>> +void drm_gpuva_destroy_locked(struct drm_gpuva *va);
>> +void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
>> +
>> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
>> +				 u64 addr, u64 range);
>> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
>> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
>> +
>> +/**
>> + * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
>> + * @va: the &drm_gpuva to set the swap flag of
>> + * @swap: indicates whether the &drm_gpuva is swapped
>> + */
>> +static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
>> +{
>> +	if (swap)
>> +		va->flags |= DRM_GPUVA_SWAPPED;
>> +	else
>> +		va->flags &= ~DRM_GPUVA_SWAPPED;
>> +}
>> +
>> +/**
>> + * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
>> + * is swapped
>> + * @va: the &drm_gpuva to check
>> + */
>> +static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
>> +{
>> +	return va->flags & DRM_GPUVA_SWAPPED;
>> +}
>> +
>> +/**
>> + * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
>> + * @node__: &drm_gpuva structure to assign to in each iteration step
>> + * @gpuva__: &drm_gpuva_manager structure to walk
>> + * @start__: starting offset, the first node will overlap this
>> + * @end__: ending offset, the last node will start before this (but may overlap)
>> + *
>> + * This iterator walks over all nodes in the range allocator that lie
>> + * between @start and @end. It is implemented similarly to list_for_each(),
>> + * but is using &drm_mm's internal interval tree to accelerate the search for
>> + * the starting node, and hence isn't safe against removal of elements. It
>> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>> + * backing &drm_mm, and may even continue indefinitely.
>> + */
>> +#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
>> +	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
>> +								  (start__), (end__)-1); \
>> +	     node__->node.start < (end__); \
>> +	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
>> +
>> +/**
>> + * drm_gpuva_for_each_va - iternator to walk over a range of nodes
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gpuva_for_each_va(entry, gpuva) \
>> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>> +
>> +/**
>> + * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
>> + * nodes
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva to store the next step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>> + * against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
>> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>> +
>> +/**
>> + * enum drm_gpuva_op_type - GPU VA operation type
>> + *
>> + * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
>> + * can be map, remap or unmap operations.
>> + */
>> +enum drm_gpuva_op_type {
>> +	/**
>> +	 * @DRM_GPUVA_OP_MAP: the map op type
>> +	 */
>> +	DRM_GPUVA_OP_MAP,
>> +
>> +	/**
>> +	 * @DRM_GPUVA_OP_REMAP: the remap op type
>> +	 */
>> +	DRM_GPUVA_OP_REMAP,
>> +
>> +	/**
>> +	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
>> +	 */
>> +	DRM_GPUVA_OP_UNMAP,
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_map - GPU VA map operation
>> + *
>> + * This structure represents a single map operation generated by the
>> + * DRM GPU VA manager.
>> + */
>> +struct drm_gpuva_op_map {
>> +	/**
>> +	 * @va: structure containing address and range of a map
>> +	 * operation
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @addr: the base address of the new mapping
>> +		 */
>> +		u64 addr;
>> +
>> +		/**
>> +		 * @range: the range of the new mapping
>> +		 */
>> +		u64 range;
>> +	} va;
>> +
>> +	/**
>> +	 * @gem: structure containing the &drm_gem_object and it's offset
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @offset: the offset within the &drm_gem_object
>> +		 */
>> +		u64 offset;
>> +
>> +		/**
>> +		 * @obj: the &drm_gem_object to map
>> +		 */
>> +		struct drm_gem_object *obj;
>> +	} gem;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
>> + *
>> + * This structure represents a single unmap operation generated by the
>> + * DRM GPU VA manager.
>> + */
>> +struct drm_gpuva_op_unmap {
>> +	/**
>> +	 * @va: the &drm_gpuva to unmap
>> +	 */
>> +	struct drm_gpuva *va;
>> +
>> +	/**
>> +	 * @keep:
>> +	 *
>> +	 * Indicates whether this &drm_gpuva is physically contiguous with the
>> +	 * original mapping request.
>> +	 *
>> +	 * Optionally, if &keep is set, drivers may keep the actual page table
>> +	 * mappings for this &drm_gpuva, adding the missing page table entries
>> +	 * only and update the &drm_gpuva_manager accordingly.
>> +	 */
>> +	bool keep;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_remap - GPU VA remap operation
>> + *
>> + * This represents a single remap operation generated by the DRM GPU VA manager.
>> + *
>> + * A remap operation is generated when an existing GPU VA mmapping is split up
>> + * by inserting a new GPU VA mapping or by partially unmapping existent
>> + * mapping(s), hence it consists of a maximum of two map and one unmap
>> + * operation.
>> + *
>> + * The @unmap operation takes care of removing the original existing mapping.
>> + * @prev is used to remap the preceding part, @next the subsequent part.
>> + *
>> + * If either a new mapping's start address is aligned with the start address
>> + * of the old mapping or the new mapping's end address is aligned with the
>> + * end address of the old mapping, either @prev or @next is NULL.
>> + *
>> + * Note, the reason for a dedicated remap operation, rather than arbitrary
>> + * unmap and map operations, is to give drivers the chance of extracting driver
>> + * specific data for creating the new mappings from the unmap operations's
>> + * &drm_gpuva structure which typically is embedded in larger driver specific
>> + * structures.
>> + */
>> +struct drm_gpuva_op_remap {
>> +	/**
>> +	 * @prev: the preceding part of a split mapping
>> +	 */
>> +	struct drm_gpuva_op_map *prev;
>> +
>> +	/**
>> +	 * @next: the subsequent part of a split mapping
>> +	 */
>> +	struct drm_gpuva_op_map *next;
>> +
>> +	/**
>> +	 * @unmap: the unmap operation for the original existing mapping
>> +	 */
>> +	struct drm_gpuva_op_unmap *unmap;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op - GPU VA operation
>> + *
>> + * This structure represents a single generic operation, which can be either
>> + * map, unmap or remap.
>> + *
>> + * The particular type of the operation is defined by @op.
>> + */
>> +struct drm_gpuva_op {
>> +	/**
>> +	 * @entry:
>> +	 *
>> +	 * The &list_head used to distribute instances of this struct within
>> +	 * &drm_gpuva_ops.
>> +	 */
>> +	struct list_head entry;
>> +
>> +	/**
>> +	 * @op: the type of the operation
>> +	 */
>> +	enum drm_gpuva_op_type op;
>> +
>> +	union {
>> +		/**
>> +		 * @map: the map operation
>> +		 */
>> +		struct drm_gpuva_op_map map;
>> +
>> +		/**
>> +		 * @unmap: the unmap operation
>> +		 */
>> +		struct drm_gpuva_op_unmap unmap;
>> +
>> +		/**
>> +		 * @remap: the remap operation
>> +		 */
>> +		struct drm_gpuva_op_remap remap;
>> +	};
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
>> + */
>> +struct drm_gpuva_ops {
>> +	/**
>> +	 * @list: the &list_head
>> +	 */
>> +	struct list_head list;
>> +};
>> +
>> +/**
>> + * drm_gpuva_for_each_op - iterator to walk over all ops
>> + * @op: &drm_gpuva_op to assign in each iteration step
>> + * @ops: &drm_gpuva_ops to walk
>> + *
>> + * This iterator walks over all ops within a given list of operations.
>> + */
>> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
>> +
>> +/**
>> + * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
>> + * @op: &drm_gpuva_op to assign in each iteration step
>> + * @next: &next &drm_gpuva_op to store the next step
>> + * @ops: &drm_gpuva_ops to walk
>> + *
>> + * This iterator walks over all ops within a given list of operations. It is
>> + * implemented with list_for_each_safe(), so save against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
>> +	list_for_each_entry_safe(op, next, &(ops)->list, entry)
>> +
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>> +			    u64 addr, u64 range,
>> +			    struct drm_gem_object *obj, u64 offset);
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>> +			      u64 addr, u64 range);
>> +void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
>> +
>> +#endif /* __DRM_GPUVA_MGR_H__ */
>> -- 
>> 2.39.0
>>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-02-06 13:35     ` Christian König
@ 2023-02-06 13:46       ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-06 13:46 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: daniel, airlied, bskeggs, jason, tzimmermann, mripard, corbet,
	nouveau, linux-kernel, dri-devel, linux-doc

On 2/6/23 14:35, Christian König wrote:
> Am 03.02.23 um 18:37 schrieb Matthew Brost:
>> On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
>>> This adds the infrastructure for a manager implementation to keep track
>>> of GPU virtual address (VA) mappings.
>>>
>>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>>> start implementing, allow userspace applications to request multiple and
>>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>>> intended to serve the following purposes in this context.
>>>
>>> 1) Provide a dedicated range allocator to track GPU VA allocations and
>>>     mappings, making use of the drm_mm range allocator.
>>>
>>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>>     particular DRM GEM objects.
>>>
>>> 3) Provide a common implementation to perform more complex mapping
>>>     operations on the GPU VA space. In particular splitting and merging
>>>     of GPU VA mappings, e.g. for intersecting mapping requests or 
>>> partial
>>>     unmap requests.
>>>
>> Over the past week I've hacked together a PoC port of Xe to GPUVA [1], so
>> far it seems really promising. 95% of the way to being feature
>> equivalent of the current Xe VM bind implementation and have line of
>> sight to getting sparse bindings implemented on top of GPUVA too. IMO
>> this has basically everything we need for Xe with a few tweaks.
>>
>> I am out until 2/14 but wanted to get my thoughts / suggestions out on
>> the list before I leave.
>>
>> 1. GPUVA post didn't support the way Xe does userptrs - a NULL GEM. I
>> believe with [2], [3], and [4] GPUVA will support NULL GEMs. Also my
>> thinking sparse binds will also have NULL GEMs, more on sparse bindings
>> below.
>>
>> 2. I agree with Christian that drm_mm probably isn't what we want to
>> base the GPUVA implementation on, rather a RB tree or Maple tree has
>> been discussed. The implementation should be fairly easy to tune once we
>> have benchmarks running so not to concerned here as we can figure this
>> out down the line.
>>
>> 3. In Xe we want create xe_vm_op list which inherits from drm_gpuva_op
>> I've done this with a hack [5], I believe when we rebase we can do this
>> with a custom callback to allocate a large op size.
>>
>> 4. I'd like add user bits to drm_gpuva_flags like I do in [6]. This is
>> similar to DMA_FENCE_FLAG_USER_BITS.
>>
>> 5. In Xe we have VM prefetch operation which is needed for our compute
>> UMD with page faults. I'd like add prefetch type of operation like we do
>> in [7].
>>
>> 6. In Xe we have VM unbind all mappings for a GEM IOCTL, I'd like to add
>> support to generate this operation list to GPUVA like we do in [8].
>>
>> 7. I've thought about how Xe will implement sparse mappings (read 0,
>> writes dropped). My current thinking is a sparse mapping will be
>> represented as a drm_gpuva rather than region like in Nouveau. Making
>> regions optional to me seems likes good idea rather than forcing the
>> user of GPUVA code to create 1 large region for the manager as I
>> currently do in the Xe PoC.
> 
>  From Danilo's explanation I'm now pretty sure that regions won't work 
> for Nouveau either.
> 
> He seems to use an incorrect assumption about applications not changing 
> the sparse mappings behind a VkBuffer while the GPU is using it.
> 
> As far as I can see games like Forza won't work with this approach.
> 

I appreciate sharing your concerns since they're seriously helping me to 
improve things and consider things I missed before. However, I'd prefer 
to wait for clarification before distributing them to other sub-threads. 
Depending on whether the understanding of where those concerns arise 
from turns out to be right or wrong, this might cause confusion for 
people not following *all* sub-threads.

>>
>> 8. Personally I'd like the caller to own the locking for GEM drm_gpuva
>> list (drm_gpuva_link_*, drm_gpuva_unlink_* functions). In Xe we almost
>> certainly will have the GEM dma-resv lock when we touch this list so an
>> extra lock here is redundant. Also it kinda goofy that caller owns the
>> for drm_gpuva insertion / removal but not the locking for this list.
>>
>> WRT to Christian thoughts on a common uAPI rules for VM binds, I kinda
>> like that idea but I don't think that is necessary. All of pur uAPI
>> should be close but also the GPUVA implementation should be flexible
>> enough to fit all of our needs and I think for the most part it is.
> 
> Maybe I should refine my concerns: A common component for GPUVM mappings 
> is a good idea, but we should not expect that to define any driver 
> independent UAPI.
> 
> If we want to define driver independent UAPI we should do so explicitly.
> 
> Christian.
> 
>>
>> Let me know what everything thinks about this. It would be great if when
>> I'm back on 2/14 I can rebase the Xe port to GPUVA on another version of
>> the GPUVA code and get sparse binding support implementation. Also I'd
>> like to get GPUVA merged in the Xe repo ASAP as our VM bind code badly
>> needed to be cleaned and this was the push we needed to make this
>> happen.
>>
>> Matt
>>
>> [1] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314
>> [2] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=2ae21d7a3f52e5eb2c105ed8ae231471274bdc36
>> [3] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=49fca9f5d96201f5cbd1b19c7ff17eedfac65cdc
>> [4] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=61fa6b1e1f10e791ae82358fa971b04421d53024
>> [5] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=87fc08dcf0840e794b38269fe4c6a95d088d79ec
>> [6] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=a4826c22f6788bc29906ffa263c1cd3c4661fa77
>> [7] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=f008bbb55b213868e52c7b9cda4c1bfb95af6aee
>> [8] 
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=41f4f71c05d04d2b17d988dd95369b5df2d7f681
>>
>>> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>> ---
>>>   Documentation/gpu/drm-mm.rst    |   31 +
>>>   drivers/gpu/drm/Makefile        |    1 +
>>>   drivers/gpu/drm/drm_gem.c       |    3 +
>>>   drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
>>>   include/drm/drm_drv.h           |    6 +
>>>   include/drm/drm_gem.h           |   75 ++
>>>   include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
>>>   7 files changed, 1966 insertions(+)
>>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>>
>>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>>> index a52e6f4117d6..c9f120cfe730 100644
>>> --- a/Documentation/gpu/drm-mm.rst
>>> +++ b/Documentation/gpu/drm-mm.rst
>>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>>      :export:
>>> +DRM GPU VA Manager
>>> +==================
>>> +
>>> +Overview
>>> +--------
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>>> +   :doc: Overview
>>> +
>>> +Split and Merge
>>> +---------------
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>>> +   :doc: Split and Merge
>>> +
>>> +Locking
>>> +-------
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>>> +   :doc: Locking
>>> +
>>> +
>>> +DRM GPU VA Manager Function References
>>> +--------------------------------------
>>> +
>>> +.. kernel-doc:: include/drm/drm_gpuva_mgr.h
>>> +   :internal:
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>>> +   :export:
>>> +
>>>   DRM Buddy Allocator
>>>   ===================
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 4fe190aee584..de2ffca3b6e4 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -45,6 +45,7 @@ drm-y := \
>>>       drm_vblank.o \
>>>       drm_vblank_work.o \
>>>       drm_vma_manager.o \
>>> +    drm_gpuva_mgr.o \
>>>       drm_writeback.o
>>>   drm-$(CONFIG_DRM_LEGACY) += \
>>>       drm_agpsupport.o \
>>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>>> index 59a0bb5ebd85..65115fe88627 100644
>>> --- a/drivers/gpu/drm/drm_gem.c
>>> +++ b/drivers/gpu/drm/drm_gem.c
>>> @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct 
>>> drm_device *dev,
>>>       if (!obj->resv)
>>>           obj->resv = &obj->_resv;
>>> +    if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
>>> +        drm_gem_gpuva_init(obj);
>>> +
>>>       drm_vma_node_reset(&obj->vma_node);
>>>       INIT_LIST_HEAD(&obj->lru_node);
>>>   }
>>> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c 
>>> b/drivers/gpu/drm/drm_gpuva_mgr.c
>>> new file mode 100644
>>> index 000000000000..e665f642689d
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
>>> @@ -0,0 +1,1323 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (c) 2022 Red Hat.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom 
>>> the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be 
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + * Authors:
>>> + *     Danilo Krummrich <dakr@redhat.com>
>>> + *
>>> + */
>>> +
>>> +#include <drm/drm_gem.h>
>>> +#include <drm/drm_gpuva_mgr.h>
>>> +
>>> +/**
>>> + * DOC: Overview
>>> + *
>>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager 
>>> keeps track
>>> + * of a GPU's virtual address (VA) space and manages the 
>>> corresponding virtual
>>> + * mappings represented by &drm_gpuva objects. It also keeps track 
>>> of the
>>> + * mapping's backing &drm_gem_object buffers.
>>> + *
>>> + * &drm_gem_object buffers maintain a list (and a corresponding list 
>>> lock) of
>>> + * &drm_gpuva objects representing all existent GPU VA mappings 
>>> using this
>>> + * &drm_gem_object as backing buffer.
>>> + *
>>> + * A GPU VA mapping can only be created within a previously allocated
>>> + * &drm_gpuva_region, which represents a reserved portion of the GPU 
>>> VA space.
>>> + * GPU VA mappings are not allowed to span over a 
>>> &drm_gpuva_region's boundary.
>>> + *
>>> + * GPU VA regions can also be flagged as sparse, which allows 
>>> drivers to create
>>> + * sparse mappings for a whole GPU VA region in order to support Vulkan
>>> + * 'Sparse Resources'.
>>> + *
>>> + * The GPU VA manager internally uses the &drm_mm range allocator to 
>>> manage the
>>> + * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's 
>>> virtual address
>>> + * space.
>>> + *
>>> + * Besides the GPU VA space regions (&drm_gpuva_region) allocated by 
>>> a driver
>>> + * the &drm_gpuva_manager contains a special region representing the 
>>> portion of
>>> + * VA space reserved by the kernel. This node is initialized 
>>> together with the
>>> + * GPU VA manager instance and removed when the GPU VA manager is 
>>> destroyed.
>>> + *
>>> + * In a typical application drivers would embed struct 
>>> drm_gpuva_manager,
>>> + * struct drm_gpuva_region and struct drm_gpuva within their own driver
>>> + * specific structures, there won't be any memory allocations of 
>>> it's own nor
>>> + * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
>>> + */
>>> +
>>> +/**
>>> + * DOC: Split and Merge
>>> + *
>>> + * The DRM GPU VA manager also provides an algorithm implementing 
>>> splitting and
>>> + * merging of existent GPU VA mappings with the ones that are 
>>> requested to be
>>> + * mapped or unmapped. This feature is required by the Vulkan API to 
>>> implement
>>> + * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to 
>>> this as
>>> + * VM BIND.
>>> + *
>>> + * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list 
>>> of map, unmap
>>> + * and remap operations for a given newly requested mapping. This list
>>> + * represents the set of operations to execute in order to integrate 
>>> the new
>>> + * mapping cleanly into the current state of the GPU VA space.
>>> + *
>>> + * Depending on how the new GPU VA mapping intersects with the 
>>> existent mappings
>>> + * of the GPU VA space the &drm_gpuva_ops contain an arbitrary 
>>> amount of unmap
>>> + * operations, a maximum of two remap operations and a single map 
>>> operation.
>>> + * The set of operations can also be empty if no operation is 
>>> required, e.g. if
>>> + * the requested mapping already exists in the exact same way.
>>> + *
>>> + * The single map operation, if existent, represents the original 
>>> map operation
>>> + * requested by the caller. Please note that this operation might be 
>>> altered
>>> + * comparing it with the original map operation, e.g. because it was 
>>> merged with
>>> + * an already  existent mapping. Hence, drivers must execute this 
>>> map operation
>>> + * instead of the original one they passed to 
>>> drm_gpuva_sm_map_ops_create().
>>> + *
>>> + * &drm_gpuva_op_unmap contains a 'keep' field, which indicates 
>>> whether the
>>> + * &drm_gpuva to unmap is physically contiguous with the original 
>>> mapping
>>> + * request. Optionally, if 'keep' is set, drivers may keep the 
>>> actual page table
>>> + * entries for this &drm_gpuva, adding the missing page table 
>>> entries only and
>>> + * update the &drm_gpuva_manager's view of things accordingly.
>>> + *
>>> + * Drivers may do the same optimization, namely delta page table 
>>> updates, also
>>> + * for remap operations. This is possible since &drm_gpuva_op_remap 
>>> consists of
>>> + * one unmap operation and one or two map operations, such that 
>>> drivers can
>>> + * derive the page table update delta accordingly.
>>> + *
>>> + * Note that there can't be more than two existent mappings to split 
>>> up, one at
>>> + * the beginning and one at the end of the new mapping, hence there 
>>> is a
>>> + * maximum of two remap operations.
>>> + *
>>> + * Generally, the DRM GPU VA manager never merges mappings across the
>>> + * boundaries of &drm_gpuva_regions. This is the case since merging 
>>> between
>>> + * GPU VA regions would result into unmap and map operations to be 
>>> issued for
>>> + * both regions involved although the original mapping request was 
>>> referred to
>>> + * one specific GPU VA region only. Since the other GPU VA region, 
>>> the one not
>>> + * explicitly requested to be altered, might be in use by the GPU, 
>>> we are not
>>> + * allowed to issue any map/unmap operations for this region.
>>> + *
>>> + * Note that before calling drm_gpuva_sm_map_ops_create() again with 
>>> another
>>> + * mapping request it is necessary to update the 
>>> &drm_gpuva_manager's view of
>>> + * the GPU VA space. The previously obtained operations must be 
>>> either fully
>>> + * processed or completely abandoned.
>>> + *
>>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>>> + * drm_gpuva_destroy_unlocked() should be used.
>>> + *
>>> + * Analogue to drm_gpuva_sm_map_ops_create() 
>>> drm_gpuva_sm_unmap_ops_create()
>>> + * provides drivers a the list of operations to be executed in order 
>>> to unmap
>>> + * a range of GPU VA space. The logic behind this functions is way 
>>> simpler
>>> + * though: For all existent mappings enclosed by the given range unmap
>>> + * operations are created. For mappings which are only partically 
>>> located within
>>> + * the given range, remap operations are created such that those 
>>> mappings are
>>> + * split up and re-mapped partically.
>>> + *
>>> + * The following paragraph depicts the basic constellations of 
>>> existent GPU VA
>>> + * mappings, a newly requested mapping and the resulting mappings as 
>>> implemented
>>> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary 
>>> combinations
>>> + * of those constellations.
>>> + *
>>> + * ::
>>> + *
>>> + *    1) Existent mapping is kept.
>>> + *    ----------------------------
>>> + *
>>> + *         0     a     1
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     a     1
>>> + *    req: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     a     1
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    2) Existent mapping is replaced.
>>> + *    --------------------------------
>>> + *
>>> + *         0     a     1
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     a     1
>>> + *    req: |-----------| (bo_offset=m)
>>> + *
>>> + *         0     a     1
>>> + *    new: |-----------| (bo_offset=m)
>>> + *
>>> + *
>>> + *    3) Existent mapping is replaced.
>>> + *    --------------------------------
>>> + *
>>> + *         0     a     1
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     b     1
>>> + *    req: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     b     1
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    4) Existent mapping is replaced.
>>> + *    --------------------------------
>>> + *
>>> + *         0  a  1
>>> + *    old: |-----|       (bo_offset=n)
>>> + *
>>> + *         0     a     2
>>> + *    req: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     a     2
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *    Note: We expect to see the same result for a request with a 
>>> different bo
>>> + *          and/or bo_offset.
>>> + *
>>> + *
>>> + *    5) Existent mapping is split.
>>> + *    -----------------------------
>>> + *
>>> + *         0     a     2
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *         0  b  1
>>> + *    req: |-----|       (bo_offset=n)
>>> + *
>>> + *         0  b  1  a' 2
>>> + *    new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>>> + *
>>> + *    Note: We expect to see the same result for a request with a 
>>> different bo
>>> + *          and/or non-contiguous bo_offset.
>>> + *
>>> + *
>>> + *    6) Existent mapping is kept.
>>> + *    ----------------------------
>>> + *
>>> + *         0     a     2
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *         0  a  1
>>> + *    req: |-----|       (bo_offset=n)
>>> + *
>>> + *         0     a     2
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    7) Existent mapping is split.
>>> + *    -----------------------------
>>> + *
>>> + *         0     a     2
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *               1  b  2
>>> + *    req:       |-----| (bo_offset=m)
>>> + *
>>> + *         0  a  1  b  2
>>> + *    new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>>> + *
>>> + *
>>> + *    8) Existent mapping is kept.
>>> + *    ----------------------------
>>> + *
>>> + *          0     a     2
>>> + *    old: |-----------| (bo_offset=n)
>>> + *
>>> + *               1  a  2
>>> + *    req:       |-----| (bo_offset=n+1)
>>> + *
>>> + *         0     a     2
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    9) Existent mapping is split.
>>> + *    -----------------------------
>>> + *
>>> + *         0     a     2
>>> + *    old: |-----------|       (bo_offset=n)
>>> + *
>>> + *               1     b     3
>>> + *    req:       |-----------| (bo_offset=m)
>>> + *
>>> + *         0  a  1     b     3
>>> + *    new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>>> + *
>>> + *
>>> + *    10) Existent mapping is merged.
>>> + *    -------------------------------
>>> + *
>>> + *         0     a     2
>>> + *    old: |-----------|       (bo_offset=n)
>>> + *
>>> + *               1     a     3
>>> + *    req:       |-----------| (bo_offset=n+1)
>>> + *
>>> + *         0        a        3
>>> + *    new: |-----------------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    11) Existent mapping is split.
>>> + *    ------------------------------
>>> + *
>>> + *         0        a        3
>>> + *    old: |-----------------| (bo_offset=n)
>>> + *
>>> + *               1  b  2
>>> + *    req:       |-----|       (bo_offset=m)
>>> + *
>>> + *         0  a  1  b  2  a' 3
>>> + *    new: |-----|-----|-----| 
>>> (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>>> + *
>>> + *
>>> + *    12) Existent mapping is kept.
>>> + *    -----------------------------
>>> + *
>>> + *         0        a        3
>>> + *    old: |-----------------| (bo_offset=n)
>>> + *
>>> + *               1  a  2
>>> + *    req:       |-----|       (bo_offset=n+1)
>>> + *
>>> + *         0        a        3
>>> + *    old: |-----------------| (bo_offset=n)
>>> + *
>>> + *
>>> + *    13) Existent mapping is replaced.
>>> + *    ---------------------------------
>>> + *
>>> + *               1  a  2
>>> + *    old:       |-----| (bo_offset=n)
>>> + *
>>> + *         0     a     2
>>> + *    req: |-----------| (bo_offset=n)
>>> + *
>>> + *         0     a     2
>>> + *    new: |-----------| (bo_offset=n)
>>> + *
>>> + *    Note: We expect to see the same result for a request with a 
>>> different bo
>>> + *          and/or non-contiguous bo_offset.
>>> + *
>>> + *
>>> + *    14) Existent mapping is replaced.
>>> + *    ---------------------------------
>>> + *
>>> + *               1  a  2
>>> + *    old:       |-----| (bo_offset=n)
>>> + *
>>> + *         0        a       3
>>> + *    req: |----------------| (bo_offset=n)
>>> + *
>>> + *         0        a       3
>>> + *    new: |----------------| (bo_offset=n)
>>> + *
>>> + *    Note: We expect to see the same result for a request with a 
>>> different bo
>>> + *          and/or non-contiguous bo_offset.
>>> + *
>>> + *
>>> + *    15) Existent mapping is split.
>>> + *    ------------------------------
>>> + *
>>> + *               1     a     3
>>> + *    old:       |-----------| (bo_offset=n)
>>> + *
>>> + *         0     b     2
>>> + *    req: |-----------|       (bo_offset=m)
>>> + *
>>> + *         0     b     2  a' 3
>>> + *    new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>>> + *
>>> + *
>>> + *    16) Existent mappings are merged.
>>> + *    ---------------------------------
>>> + *
>>> + *         0     a     1
>>> + *    old: |-----------|                        (bo_offset=n)
>>> + *
>>> + *                                2     a     3
>>> + *    old':                       |-----------| (bo_offset=n+2)
>>> + *
>>> + *                    1     a     2
>>> + *    req:            |-----------|             (bo_offset=n+1)
>>> + *
>>> + *                          a
>>> + *    new: |----------------------------------| (bo_offset=n)
>>> + */
>>> +
>>> +/**
>>> + * DOC: Locking
>>> + *
>>> + * Generally, the GPU VA manager does not take care of locking 
>>> itself, it is
>>> + * the drivers responsibility to take care about locking. Drivers 
>>> might want to
>>> + * protect the following operations: inserting, destroying and 
>>> iterating
>>> + * &drm_gpuva and &drm_gpuva_region objects as well as generating 
>>> split and merge
>>> + * operations.
>>> + *
>>> + * The GPU VA manager does take care of the locking of the backing
>>> + * &drm_gem_object buffers GPU VA lists though, unless the provided 
>>> functions
>>> + * documentation claims otherwise.
>>> + */
>>> +
>>> +/**
>>> + * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
>>> + * @mgr: pointer to the &drm_gpuva_manager to initialize
>>> + * @name: the name of the GPU VA space
>>> + * @start_offset: the start offset of the GPU VA space
>>> + * @range: the size of the GPU VA space
>>> + * @reserve_offset: the start of the kernel reserved GPU VA area
>>> + * @reserve_range: the size of the kernel reserved GPU VA area
>>> + *
>>> + * The &drm_gpuva_manager must be initialized with this function 
>>> before use.
>>> + *
>>> + * Note that @mgr must be cleared to 0 before calling this function. 
>>> The given
>>> + * &name is expected to be managed by the surrounding driver 
>>> structures.
>>> + */
>>> +void
>>> +drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>>> +               const char *name,
>>> +               u64 start_offset, u64 range,
>>> +               u64 reserve_offset, u64 reserve_range)
>>> +{
>>> +    drm_mm_init(&mgr->va_mm, start_offset, range);
>>> +    drm_mm_init(&mgr->region_mm, start_offset, range);
>>> +
>>> +    mgr->mm_start = start_offset;
>>> +    mgr->mm_range = range;
>>> +
>>> +    mgr->name = name ? name : "unknown";
>>> +
>>> +    memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
>>> +    mgr->kernel_alloc_node.start = reserve_offset;
>>> +    mgr->kernel_alloc_node.size = reserve_range;
>>> +    drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_manager_init);
>>> +
>>> +/**
>>> + * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
>>> + * @mgr: pointer to the &drm_gpuva_manager to clean up
>>> + *
>>> + * Note that it is a bug to call this function on a manager that still
>>> + * holds GPU VA mappings.
>>> + */
>>> +void
>>> +drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>>> +{
>>> +    mgr->name = NULL;
>>> +    drm_mm_remove_node(&mgr->kernel_alloc_node);
>>> +    drm_mm_takedown(&mgr->va_mm);
>>> +    drm_mm_takedown(&mgr->region_mm);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>>> +
>>> +static struct drm_gpuva_region *
>>> +drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>>> +{
>>> +    struct drm_gpuva_region *reg;
>>> +
>>> +    /* Find the VA region the requested range is strictly enclosed 
>>> by. */
>>> +    drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
>>> +        if (reg->node.start <= addr &&
>>> +            reg->node.start + reg->node.size >= addr + range &&
>>> +            &reg->node != &mgr->kernel_alloc_node)
>>> +            return reg;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static bool
>>> +drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 
>>> range)
>>> +{
>>> +    return !!drm_gpuva_in_region(mgr, addr, range);
>>> +}
>>> +
>>> +/**
>>> + * drm_gpuva_insert - insert a &drm_gpuva
>>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>>> + * @va: the &drm_gpuva to insert
>>> + * @addr: the start address of the GPU VA
>>> + * @range: the range of the GPU VA
>>> + *
>>> + * Insert a &drm_gpuva with a given address and range into a
>>> + * &drm_gpuva_manager.
>>> + *
>>> + * The function assumes the caller does not hold the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>>> +         struct drm_gpuva *va,
>>> +         u64 addr, u64 range)
>>> +{
>>> +    struct drm_gpuva_region *reg;
>>> +    int ret;
>>> +
>>> +    if (!va->gem.obj)
>>> +        return -EINVAL;
>>> +
>>> +    reg = drm_gpuva_in_region(mgr, addr, range);
>>> +    if (!reg)
>>> +        return -EINVAL;
>>> +
>>> +    ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
>>> +                      range, 0,
>>> +                      0, addr,
>>> +                      addr + range,
>>> +                      DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    va->mgr = mgr;
>>> +    va->region = reg;
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_insert);
>>> +
>>> +/**
>>> + * drm_gpuva_link_locked - link a &drm_gpuva
>>> + * @va: the &drm_gpuva to link
>>> + *
>>> + * This adds the given &va to the GPU VA list of the &drm_gem_object 
>>> it is
>>> + * associated with.
>>> + *
>>> + * The function assumes the caller already holds the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_link_locked(struct drm_gpuva *va)
>>> +{
>>> +    lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>>> +    list_add_tail(&va->head, &va->gem.obj->gpuva.list);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_link_locked);
>>> +
>>> +/**
>>> + * drm_gpuva_link_unlocked - unlink a &drm_gpuva
>>> + * @va: the &drm_gpuva to unlink
>>> + *
>>> + * This adds the given &va to the GPU VA list of the &drm_gem_object 
>>> it is
>>> + * associated with.
>>> + *
>>> + * The function assumes the caller does not hold the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_link_unlocked(struct drm_gpuva *va)
>>> +{
>>> +    drm_gem_gpuva_lock(va->gem.obj);
>>> +    drm_gpuva_link_locked(va);
>>> +    drm_gem_gpuva_unlock(va->gem.obj);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_link_unlocked);
>>> +
>>> +/**
>>> + * drm_gpuva_unlink_locked - unlink a &drm_gpuva
>>> + * @va: the &drm_gpuva to unlink
>>> + *
>>> + * This removes the given &va from the GPU VA list of the 
>>> &drm_gem_object it is
>>> + * associated with.
>>> + *
>>> + * The function assumes the caller already holds the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_unlink_locked(struct drm_gpuva *va)
>>> +{
>>> +    lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>>> +    list_del_init(&va->head);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_unlink_locked);
>>> +
>>> +/**
>>> + * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
>>> + * @va: the &drm_gpuva to unlink
>>> + *
>>> + * This removes the given &va from the GPU VA list of the 
>>> &drm_gem_object it is
>>> + * associated with.
>>> + *
>>> + * The function assumes the caller does not hold the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
>>> +{
>>> +    drm_gem_gpuva_lock(va->gem.obj);
>>> +    drm_gpuva_unlink_locked(va);
>>> +    drm_gem_gpuva_unlock(va->gem.obj);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
>>> +
>>> +/**
>>> + * drm_gpuva_destroy_locked - destroy a &drm_gpuva
>>> + * @va: the &drm_gpuva to destroy
>>> + *
>>> + * This removes the given &va from GPU VA list of the 
>>> &drm_gem_object it is
>>> + * associated with and removes it from the underlaying range allocator.
>>> + *
>>> + * The function assumes the caller already holds the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_destroy_locked(struct drm_gpuva *va)
>>> +{
>>> +    lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>>> +
>>> +    list_del(&va->head);
>>> +    drm_mm_remove_node(&va->node);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_destroy_locked);
>>> +
>>> +/**
>>> + * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
>>> + * @va: the &drm_gpuva to destroy
>>> + *
>>> + * This removes the given &va from GPU VA list of the 
>>> &drm_gem_object it is
>>> + * associated with and removes it from the underlaying range allocator.
>>> + *
>>> + * The function assumes the caller does not hold the &drm_gem_object's
>>> + * GPU VA list mutex.
>>> + */
>>> +void
>>> +drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
>>> +{
>>> +    drm_gem_gpuva_lock(va->gem.obj);
>>> +    list_del(&va->head);
>>> +    drm_gem_gpuva_unlock(va->gem.obj);
>>> +
>>> +    drm_mm_remove_node(&va->node);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
>>> +
>>> +/**
>>> + * drm_gpuva_find - find a &drm_gpuva
>>> + * @mgr: the &drm_gpuva_manager to search in
>>> + * @addr: the &drm_gpuvas address
>>> + * @range: the &drm_gpuvas range
>>> + *
>>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
>>> + */
>>> +struct drm_gpuva *
>>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>>> +           u64 addr, u64 range)
>>> +{
>>> +    struct drm_gpuva *va;
>>> +
>>> +    drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
>>> +        if (va->node.start == addr &&
>>> +            va->node.size == range)
>>> +            return va;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_find);
>>> +
>>> +/**
>>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>>> + * @mgr: the &drm_gpuva_manager to search in
>>> + * @start: the given GPU VA's start address
>>> + *
>>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start 
>>> address.
>>> + *
>>> + * Note that if there is any free space between the GPU VA mappings 
>>> no mapping
>>> + * is returned.
>>> + *
>>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>>> + */
>>> +struct drm_gpuva *
>>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>>> +{
>>> +    struct drm_mm_node *node;
>>> +
>>> +    if (start <= mgr->mm_start ||
>>> +        start > (mgr->mm_start + mgr->mm_range))
>>> +        return NULL;
>>> +
>>> +    node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
>>> +    if (node == &mgr->va_mm.head_node)
>>> +        return NULL;
>>> +
>>> +    return (struct drm_gpuva *)node;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_find_prev);
>>> +
>>> +/**
>>> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
>>> + * @mgr: the &drm_gpuva_manager to search in
>>> + * @end: the given GPU VA's end address
>>> + *
>>> + * Find the adjacent &drm_gpuva after the GPU VA with given &end 
>>> address.
>>> + *
>>> + * Note that if there is any free space between the GPU VA mappings 
>>> no mapping
>>> + * is returned.
>>> + *
>>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>>> + */
>>> +struct drm_gpuva *
>>> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
>>> +{
>>> +    struct drm_mm_node *node;
>>> +
>>> +    if (end < mgr->mm_start ||
>>> +        end >= (mgr->mm_start + mgr->mm_range))
>>> +        return NULL;
>>> +
>>> +    node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
>>> +    if (node == &mgr->va_mm.head_node)
>>> +        return NULL;
>>> +
>>> +    return (struct drm_gpuva *)node;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_find_next);
>>> +
>>> +/**
>>> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
>>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>>> + * @reg: the &drm_gpuva_region to insert
>>> + * @addr: the start address of the GPU VA
>>> + * @range: the range of the GPU VA
>>> + *
>>> + * Insert a &drm_gpuva_region with a given address and range into a
>>> + * &drm_gpuva_manager.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>>> +            struct drm_gpuva_region *reg,
>>> +            u64 addr, u64 range)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
>>> +                      range, 0,
>>> +                      0, addr,
>>> +                      addr + range,
>>> +                      DRM_MM_INSERT_LOW|
>>> +                      DRM_MM_INSERT_ONCE);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    reg->mgr = mgr;
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_region_insert);
>>> +
>>> +/**
>>> + * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
>>> + * @mgr: the &drm_gpuva_manager holding the region
>>> + * @reg: the &drm_gpuva to destroy
>>> + *
>>> + * This removes the given &reg from the underlaying range allocator.
>>> + */
>>> +void
>>> +drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>>> +             struct drm_gpuva_region *reg)
>>> +{
>>> +    struct drm_gpuva *va;
>>> +
>>> +    drm_gpuva_for_each_va_in_range(va, mgr,
>>> +                       reg->node.start,
>>> +                       reg->node.size) {
>>> +        WARN(1, "GPU VA region must be empty on destroy.\n");
>>> +        return;
>>> +    }
>>> +
>>> +    if (&reg->node == &mgr->kernel_alloc_node) {
>>> +        WARN(1, "Can't destroy kernel reserved region.\n");
>>> +        return;
>>> +    }
>>> +
>>> +    drm_mm_remove_node(&reg->node);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_region_destroy);
>>> +
>>> +/**
>>> + * drm_gpuva_region_find - find a &drm_gpuva_region
>>> + * @mgr: the &drm_gpuva_manager to search in
>>> + * @addr: the &drm_gpuva_regions address
>>> + * @range: the &drm_gpuva_regions range
>>> + *
>>> + * Returns: the &drm_gpuva_region at a given &addr and with a given 
>>> &range
>>> + */
>>> +struct drm_gpuva_region *
>>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>>> +              u64 addr, u64 range)
>>> +{
>>> +    struct drm_gpuva_region *reg;
>>> +
>>> +    drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
>>> +        if (reg->node.start == addr &&
>>> +            reg->node.size == range)
>>> +            return reg;
>>> +
>>> +    return NULL;
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_region_find);
>>> +
>>> +static int
>>> +gpuva_op_map_new(struct drm_gpuva_op **pop,
>>> +         u64 addr, u64 range,
>>> +         struct drm_gem_object *obj, u64 offset)
>>> +{
>>> +    struct drm_gpuva_op *op;
>>> +
>>> +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>>> +    if (!op)
>>> +        return -ENOMEM;
>>> +
>>> +    op->op = DRM_GPUVA_OP_MAP;
>>> +    op->map.va.addr = addr;
>>> +    op->map.va.range = range;
>>> +    op->map.gem.obj = obj;
>>> +    op->map.gem.offset = offset;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +gpuva_op_remap_new(struct drm_gpuva_op **pop,
>>> +           struct drm_gpuva_op_map *prev,
>>> +           struct drm_gpuva_op_map *next,
>>> +           struct drm_gpuva_op_unmap *unmap)
>>> +{
>>> +    struct drm_gpuva_op *op;
>>> +    struct drm_gpuva_op_remap *r;
>>> +
>>> +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>>> +    if (!op)
>>> +        return -ENOMEM;
>>> +
>>> +    op->op = DRM_GPUVA_OP_REMAP;
>>> +    r = &op->remap;
>>> +
>>> +    if (prev) {
>>> +        r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
>>> +        if (!r->prev)
>>> +            goto err_free_op;
>>> +    }
>>> +
>>> +    if (next) {
>>> +        r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
>>> +        if (!r->next)
>>> +            goto err_free_prev;
>>> +    }
>>> +
>>> +    r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
>>> +    if (!r->unmap)
>>> +        goto err_free_next;
>>> +
>>> +    return 0;
>>> +
>>> +err_free_next:
>>> +    if (next)
>>> +        kfree(r->next);
>>> +err_free_prev:
>>> +    if (prev)
>>> +        kfree(r->prev);
>>> +err_free_op:
>>> +    kfree(op);
>>> +    *pop = NULL;
>>> +
>>> +    return -ENOMEM;
>>> +}
>>> +
>>> +static int
>>> +gpuva_op_unmap_new(struct drm_gpuva_op **pop,
>>> +           struct drm_gpuva *va, bool merge)
>>> +{
>>> +    struct drm_gpuva_op *op;
>>> +
>>> +    op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>>> +    if (!op)
>>> +        return -ENOMEM;
>>> +
>>> +    op->op = DRM_GPUVA_OP_UNMAP;
>>> +    op->unmap.va = va;
>>> +    op->unmap.keep = merge;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#define op_map_new_to_list(_ops, _addr, _range,        \
>>> +               _obj, _offset)        \
>>> +do {                            \
>>> +    struct drm_gpuva_op *op;            \
>>> +                            \
>>> +    ret = gpuva_op_map_new(&op, _addr, _range,    \
>>> +                   _obj, _offset);        \
>>> +    if (ret)                    \
>>> +        goto err_free_ops;            \
>>> +                            \
>>> +    list_add_tail(&op->entry, _ops);        \
>>> +} while (0)
>>> +
>>> +#define op_remap_new_to_list(_ops, _prev, _next,    \
>>> +                 _unmap)            \
>>> +do {                            \
>>> +    struct drm_gpuva_op *op;            \
>>> +                            \
>>> +    ret = gpuva_op_remap_new(&op, _prev, _next,    \
>>> +                 _unmap);        \
>>> +    if (ret)                    \
>>> +        goto err_free_ops;            \
>>> +                            \
>>> +    list_add_tail(&op->entry, _ops);        \
>>> +} while (0)
>>> +
>>> +#define op_unmap_new_to_list(_ops, _gpuva, _merge)    \
>>> +do {                            \
>>> +    struct drm_gpuva_op *op;            \
>>> +                            \
>>> +    ret = gpuva_op_unmap_new(&op, _gpuva, _merge);    \
>>> +    if (ret)                    \
>>> +        goto err_free_ops;            \
>>> +                            \
>>> +    list_add_tail(&op->entry, _ops);        \
>>> +} while (0)
>>> +
>>> +/**
>>> + * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split 
>>> and merge
>>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>>> + * @req_addr: the start address of the new mapping
>>> + * @req_range: the range of the new mapping
>>> + * @req_obj: the &drm_gem_object to map
>>> + * @req_offset: the offset within the &drm_gem_object
>>> + *
>>> + * This function creates a list of operations to perform splitting 
>>> and merging
>>> + * of existent mapping(s) with the newly requested one.
>>> + *
>>> + * The list can be iterated with &drm_gpuva_for_each_op and must be 
>>> processed
>>> + * in the given order. It can contain map, unmap and remap 
>>> operations, but it
>>> + * also can be empty if no operation is required, e.g. if the 
>>> requested mapping
>>> + * already exists is the exact same way.
>>> + *
>>> + * There can be an arbitrary amount of unmap operations, a maximum 
>>> of two remap
>>> + * operations and a single map operation. The latter one, if existent,
>>> + * represents the original map operation requested by the caller. 
>>> Please note
>>> + * that the map operation might has been modified, e.g. if it was
>>> + * merged with an existent mapping.
>>> + *
>>> + * Note that before calling this function again with another mapping 
>>> request it
>>> + * is necessary to update the &drm_gpuva_manager's view of the GPU 
>>> VA space.
>>> + * The previously obtained operations must be either processed or 
>>> abandoned.
>>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>>> + * drm_gpuva_destroy_unlocked() should be used.
>>> + *
>>> + * After the caller finished processing the returned &drm_gpuva_ops, 
>>> they must
>>> + * be freed with &drm_gpuva_ops_free.
>>> + *
>>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR 
>>> on failure
>>> + */
>>> +struct drm_gpuva_ops *
>>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>>> +                u64 req_addr, u64 req_range,
>>> +                struct drm_gem_object *req_obj, u64 req_offset)
>>> +{
>>> +    struct drm_gpuva_ops *ops;
>>> +    struct drm_gpuva *va, *prev = NULL;
>>> +    u64 req_end = req_addr + req_range;
>>> +    bool skip_pmerge = false, skip_nmerge = false;
>>> +    int ret;
>>> +
>>> +    if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
>>> +        return ERR_PTR(-EINVAL);
>>> +
>>> +    ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>>> +    if (!ops)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    INIT_LIST_HEAD(&ops->list);
>>> +
>>> +    drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>>> +        struct drm_gem_object *obj = va->gem.obj;
>>> +        u64 offset = va->gem.offset;
>>> +        u64 addr = va->node.start;
>>> +        u64 range = va->node.size;
>>> +        u64 end = addr + range;
>>> +
>>> +        /* Generally, we want to skip merging with potential mappings
>>> +         * left and right of the requested one when we found a
>>> +         * collision, since merging happens in this loop already.
>>> +         *
>>> +         * However, there is one exception when the requested mapping
>>> +         * spans into a free VM area. If this is the case we might
>>> +         * still hit the boundary of another mapping before and/or
>>> +         * after the free VM area.
>>> +         */
>>> +        skip_pmerge = true;
>>> +        skip_nmerge = true;
>>> +
>>> +        if (addr == req_addr) {
>>> +            bool merge = obj == req_obj &&
>>> +                     offset == req_offset;
>>> +            if (end == req_end) {
>>> +                if (merge)
>>> +                    goto done;
>>> +
>>> +                op_unmap_new_to_list(&ops->list, va, false);
>>> +                break;
>>> +            }
>>> +
>>> +            if (end < req_end) {
>>> +                skip_nmerge = false;
>>> +                op_unmap_new_to_list(&ops->list, va, merge);
>>> +                goto next;
>>> +            }
>>> +
>>> +            if (end > req_end) {
>>> +                struct drm_gpuva_op_map n = {
>>> +                    .va.addr = req_end,
>>> +                    .va.range = range - req_range,
>>> +                    .gem.obj = obj,
>>> +                    .gem.offset = offset + req_range,
>>> +                };
>>> +                struct drm_gpuva_op_unmap u = { .va = va };
>>> +
>>> +                if (merge)
>>> +                    goto done;
>>> +
>>> +                op_remap_new_to_list(&ops->list, NULL, &n, &u);
>>> +                break;
>>> +            }
>>> +        } else if (addr < req_addr) {
>>> +            u64 ls_range = req_addr - addr;
>>> +            struct drm_gpuva_op_map p = {
>>> +                .va.addr = addr,
>>> +                .va.range = ls_range,
>>> +                .gem.obj = obj,
>>> +                .gem.offset = offset,
>>> +            };
>>> +            struct drm_gpuva_op_unmap u = { .va = va };
>>> +            bool merge = obj == req_obj &&
>>> +                     offset + ls_range == req_offset;
>>> +
>>> +            if (end == req_end) {
>>> +                if (merge)
>>> +                    goto done;
>>> +
>>> +                op_remap_new_to_list(&ops->list, &p, NULL, &u);
>>> +                break;
>>> +            }
>>> +
>>> +            if (end < req_end) {
>>> +                u64 new_addr = addr;
>>> +                u64 new_range = req_range + ls_range;
>>> +                u64 new_offset = offset;
>>> +
>>> +                /* We validated that the requested mapping is
>>> +                 * within a single VA region already.
>>> +                 * Since it overlaps the current mapping (which
>>> +                 * can't cross a VA region boundary) we can be
>>> +                 * sure that we're still within the boundaries
>>> +                 * of the same VA region after merging.
>>> +                 */
>>> +                if (merge) {
>>> +                    req_offset = new_offset;
>>> +                    req_addr = new_addr;
>>> +                    req_range = new_range;
>>> +                    op_unmap_new_to_list(&ops->list, va, true);
>>> +                    goto next;
>>> +                }
>>> +
>>> +                op_remap_new_to_list(&ops->list, &p, NULL, &u);
>>> +                goto next;
>>> +            }
>>> +
>>> +            if (end > req_end) {
>>> +                struct drm_gpuva_op_map n = {
>>> +                    .va.addr = req_end,
>>> +                    .va.range = end - req_end,
>>> +                    .gem.obj = obj,
>>> +                    .gem.offset = offset + ls_range +
>>> +                              req_range,
>>> +                };
>>> +
>>> +                if (merge)
>>> +                    goto done;
>>> +
>>> +                op_remap_new_to_list(&ops->list, &p, &n, &u);
>>> +                break;
>>> +            }
>>> +        } else if (addr > req_addr) {
>>> +            bool merge = obj == req_obj &&
>>> +                     offset == req_offset +
>>> +                           (addr - req_addr);
>>> +            if (!prev)
>>> +                skip_pmerge = false;
>>> +
>>> +            if (end == req_end) {
>>> +                op_unmap_new_to_list(&ops->list, va, merge);
>>> +                break;
>>> +            }
>>> +
>>> +            if (end < req_end) {
>>> +                skip_nmerge = false;
>>> +                op_unmap_new_to_list(&ops->list, va, merge);
>>> +                goto next;
>>> +            }
>>> +
>>> +            if (end > req_end) {
>>> +                struct drm_gpuva_op_map n = {
>>> +                    .va.addr = req_end,
>>> +                    .va.range = end - req_end,
>>> +                    .gem.obj = obj,
>>> +                    .gem.offset = offset + req_end - addr,
>>> +                };
>>> +                struct drm_gpuva_op_unmap u = { .va = va };
>>> +                u64 new_end = end;
>>> +                u64 new_range = new_end - req_addr;
>>> +
>>> +                /* We validated that the requested mapping is
>>> +                 * within a single VA region already.
>>> +                 * Since it overlaps the current mapping (which
>>> +                 * can't cross a VA region boundary) we can be
>>> +                 * sure that we're still within the boundaries
>>> +                 * of the same VA region after merging.
>>> +                 */
>>> +                if (merge) {
>>> +                    req_end = new_end;
>>> +                    req_range = new_range;
>>> +                    op_unmap_new_to_list(&ops->list, va, true);
>>> +                    break;
>>> +                }
>>> +
>>> +                op_remap_new_to_list(&ops->list, NULL, &n, &u);
>>> +                break;
>>> +            }
>>> +        }
>>> +next:
>>> +        prev = va;
>>> +    }
>>> +
>>> +    va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
>>> +    if (va) {
>>> +        struct drm_gem_object *obj = va->gem.obj;
>>> +        u64 offset = va->gem.offset;
>>> +        u64 addr = va->node.start;
>>> +        u64 range = va->node.size;
>>> +        u64 new_offset = offset;
>>> +        u64 new_addr = addr;
>>> +        u64 new_range = req_range + range;
>>> +        bool merge = obj == req_obj &&
>>> +                 offset + range == req_offset;
>>> +
>>> +        /* Don't merge over VA region boundaries. */
>>> +        merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
>>> +        if (merge) {
>>> +            op_unmap_new_to_list(&ops->list, va, true);
>>> +
>>> +            req_offset = new_offset;
>>> +            req_addr = new_addr;
>>> +            req_range = new_range;
>>> +        }
>>> +    }
>>> +
>>> +    va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
>>> +    if (va) {
>>> +        struct drm_gem_object *obj = va->gem.obj;
>>> +        u64 offset = va->gem.offset;
>>> +        u64 addr = va->node.start;
>>> +        u64 range = va->node.size;
>>> +        u64 end = addr + range;
>>> +        u64 new_range = req_range + range;
>>> +        u64 new_end = end;
>>> +        bool merge = obj == req_obj &&
>>> +                 offset == req_offset + req_range;
>>> +
>>> +        /* Don't merge over VA region boundaries. */
>>> +        merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
>>> +        if (merge) {
>>> +            op_unmap_new_to_list(&ops->list, va, true);
>>> +
>>> +            req_range = new_range;
>>> +            req_end = new_end;
>>> +        }
>>> +    }
>>> +
>>> +    op_map_new_to_list(&ops->list,
>>> +               req_addr, req_range,
>>> +               req_obj, req_offset);
>>> +
>>> +done:
>>> +    return ops;
>>> +
>>> +err_free_ops:
>>> +    drm_gpuva_ops_free(ops);
>>> +    return ERR_PTR(ret);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
>>> +
>>> +#undef op_map_new_to_list
>>> +#undef op_remap_new_to_list
>>> +#undef op_unmap_new_to_list
>>> +
>>> +/**
>>> + * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to 
>>> split on unmap
>>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>>> + * @req_addr: the start address of the range to unmap
>>> + * @req_range: the range of the mappings to unmap
>>> + *
>>> + * This function creates a list of operations to perform unmapping 
>>> and, if
>>> + * required, splitting of the mappings overlapping the unmap range.
>>> + *
>>> + * The list can be iterated with &drm_gpuva_for_each_op and must be 
>>> processed
>>> + * in the given order. It can contain unmap and remap operations, 
>>> depending on
>>> + * whether there are actual overlapping mappings to split.
>>> + *
>>> + * There can be an arbitrary amount of unmap operations and a 
>>> maximum of two
>>> + * remap operations.
>>> + *
>>> + * Note that before calling this function again with another range 
>>> to unmap it
>>> + * is necessary to update the &drm_gpuva_manager's view of the GPU 
>>> VA space.
>>> + * The previously obtained operations must be processed or abandoned.
>>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>>> + * drm_gpuva_destroy_unlocked() should be used.
>>> + *
>>> + * After the caller finished processing the returned &drm_gpuva_ops, 
>>> they must
>>> + * be freed with &drm_gpuva_ops_free.
>>> + *
>>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR 
>>> on failure
>>> + */
>>> +struct drm_gpuva_ops *
>>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>>> +                  u64 req_addr, u64 req_range)
>>> +{
>>> +    struct drm_gpuva_ops *ops;
>>> +    struct drm_gpuva_op *op;
>>> +    struct drm_gpuva_op_remap *r;
>>> +    struct drm_gpuva *va;
>>> +    u64 req_end = req_addr + req_range;
>>> +    int ret;
>>> +
>>> +    ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>>> +    if (!ops)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    INIT_LIST_HEAD(&ops->list);
>>> +
>>> +    drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>>> +        struct drm_gem_object *obj = va->gem.obj;
>>> +        u64 offset = va->gem.offset;
>>> +        u64 addr = va->node.start;
>>> +        u64 range = va->node.size;
>>> +        u64 end = addr + range;
>>> +
>>> +        op = kzalloc(sizeof(*op), GFP_KERNEL);
>>> +        if (!op) {
>>> +            ret = -ENOMEM;
>>> +            goto err_free_ops;
>>> +        }
>>> +
>>> +        r = &op->remap;
>>> +
>>> +        if (addr < req_addr) {
>>> +            r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
>>> +            if (!r->prev) {
>>> +                ret = -ENOMEM;
>>> +                goto err_free_op;
>>> +            }
>>> +
>>> +            r->prev->va.addr = addr;
>>> +            r->prev->va.range = req_addr - addr;
>>> +            r->prev->gem.obj = obj;
>>> +            r->prev->gem.offset = offset;
>>> +        }
>>> +
>>> +        if (end > req_end) {
>>> +            r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
>>> +            if (!r->next) {
>>> +                ret = -ENOMEM;
>>> +                goto err_free_prev;
>>> +            }
>>> +
>>> +            r->next->va.addr = req_end;
>>> +            r->next->va.range = end - req_end;
>>> +            r->next->gem.obj = obj;
>>> +            r->next->gem.offset = offset + (req_end - addr);
>>> +        }
>>> +
>>> +        if (op->remap.prev || op->remap.next) {
>>> +            op->op = DRM_GPUVA_OP_REMAP;
>>> +            r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
>>> +            if (!r->unmap) {
>>> +                ret = -ENOMEM;
>>> +                goto err_free_next;
>>> +            }
>>> +
>>> +            r->unmap->va = va;
>>> +        } else {
>>> +            op->op = DRM_GPUVA_OP_UNMAP;
>>> +            op->unmap.va = va;
>>> +        }
>>> +
>>> +        list_add_tail(&op->entry, &ops->list);
>>> +    }
>>> +
>>> +    return ops;
>>> +
>>> +err_free_next:
>>> +    if (r->next)
>>> +        kfree(r->next);
>>> +err_free_prev:
>>> +    if (r->prev)
>>> +        kfree(r->prev);
>>> +err_free_op:
>>> +    kfree(op);
>>> +err_free_ops:
>>> +    drm_gpuva_ops_free(ops);
>>> +    return ERR_PTR(ret);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
>>> +
>>> +/**
>>> + * drm_gpuva_ops_free - free the given &drm_gpuva_ops
>>> + * @ops: the &drm_gpuva_ops to free
>>> + *
>>> + * Frees the given &drm_gpuva_ops structure including all the ops 
>>> associated
>>> + * with it.
>>> + */
>>> +void
>>> +drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
>>> +{
>>> +    struct drm_gpuva_op *op, *next;
>>> +
>>> +    drm_gpuva_for_each_op_safe(op, next, ops) {
>>> +        list_del(&op->entry);
>>> +        if (op->op == DRM_GPUVA_OP_REMAP) {
>>> +            if (op->remap.prev)
>>> +                kfree(op->remap.prev);
>>> +
>>> +            if (op->remap.next)
>>> +                kfree(op->remap.next);
>>> +
>>> +            kfree(op->remap.unmap);
>>> +        }
>>> +        kfree(op);
>>> +    }
>>> +
>>> +    kfree(ops);
>>> +}
>>> +EXPORT_SYMBOL(drm_gpuva_ops_free);
>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>> index d7c521e8860f..6feacd93aca6 100644
>>> --- a/include/drm/drm_drv.h
>>> +++ b/include/drm/drm_drv.h
>>> @@ -104,6 +104,12 @@ enum drm_driver_feature {
>>>        * acceleration should be handled by two drivers that are 
>>> connected using auxiliary bus.
>>>        */
>>>       DRIVER_COMPUTE_ACCEL            = BIT(7),
>>> +    /**
>>> +     * @DRIVER_GEM_GPUVA:
>>> +     *
>>> +     * Driver supports user defined GPU VA bindings for GEM objects.
>>> +     */
>>> +    DRIVER_GEM_GPUVA        = BIT(8),
>>>       /* IMPORTANT: Below are all the legacy flags, add new ones 
>>> above. */
>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>> index 772a4adf5287..4a3679034966 100644
>>> --- a/include/drm/drm_gem.h
>>> +++ b/include/drm/drm_gem.h
>>> @@ -36,6 +36,8 @@
>>>   #include <linux/kref.h>
>>>   #include <linux/dma-resv.h>
>>> +#include <linux/list.h>
>>> +#include <linux/mutex.h>
>>>   #include <drm/drm_vma_manager.h>
>>> @@ -337,6 +339,17 @@ struct drm_gem_object {
>>>        */
>>>       struct dma_resv _resv;
>>> +    /**
>>> +     * @gpuva:
>>> +     *
>>> +     * Provides the list and list mutex of GPU VAs attached to this
>>> +     * GEM object.
>>> +     */
>>> +    struct {
>>> +        struct list_head list;
>>> +        struct mutex mutex;
>>> +    } gpuva;
>>> +
>>>       /**
>>>        * @funcs:
>>>        *
>>> @@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru 
>>> *lru, struct drm_gem_object *obj);
>>>   unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned 
>>> nr_to_scan,
>>>                      bool (*shrink)(struct drm_gem_object *obj));
>>> +/**
>>> + * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
>>> + * @obj: the &drm_gem_object
>>> + *
>>> + * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
>>> + * protecting it.
>>> + *
>>> + * Calling this function is only necessary for drivers intending to 
>>> support the
>>> + * &drm_driver_feature DRIVER_GEM_GPUVA.
>>> + */
>>> +static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>>> +{
>>> +    INIT_LIST_HEAD(&obj->gpuva.list);
>>> +    mutex_init(&obj->gpuva.mutex);
>>> +}
>>> +
>>> +/**
>>> + * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
>>> + * @obj: the &drm_gem_object
>>> + *
>>> + * This unlocks the mutex protecting the &drm_gem_object's 
>>> &drm_gpuva list.
>>> + */
>>> +static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
>>> +{
>>> +    mutex_lock(&obj->gpuva.mutex);
>>> +}
>>> +
>>> +/**
>>> + * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
>>> + * @obj: the &drm_gem_object
>>> + *
>>> + * This unlocks the mutex protecting the &drm_gem_object's 
>>> &drm_gpuva list.
>>> + */
>>> +static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
>>> +{
>>> +    mutex_unlock(&obj->gpuva.mutex);
>>> +}
>>> +
>>> +/**
>>> + * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
>>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated 
>>> with
>>> + *
>>> + * This iterator walks over all &drm_gpuva structures associated 
>>> with the
>>> + * &drm_gpuva_manager.
>>> + */
>>> +#define drm_gem_for_each_gpuva(entry, obj) \
>>> +    list_for_each_entry(entry, &obj->gpuva.list, head)
>>> +
>>> +/**
>>> + * drm_gem_for_each_gpuva_safe - iternator to safely walk over a 
>>> list of gpuvas
>>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>>> + * @next: &next &drm_gpuva to store the next step
>>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated 
>>> with
>>> + *
>>> + * This iterator walks over all &drm_gpuva structures associated 
>>> with the
>>> + * &drm_gem_object. It is implemented with 
>>> list_for_each_entry_safe(), hence
>>> + * it is save against removal of elements.
>>> + */
>>> +#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
>>> +    list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
>>> +
>>>   #endif /* __DRM_GEM_H__ */
>>> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
>>> new file mode 100644
>>> index 000000000000..adeb0c916e91
>>> --- /dev/null
>>> +++ b/include/drm/drm_gpuva_mgr.h
>>> @@ -0,0 +1,527 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +#ifndef __DRM_GPUVA_MGR_H__
>>> +#define __DRM_GPUVA_MGR_H__
>>> +
>>> +/*
>>> + * Copyright (c) 2022 Red Hat.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom 
>>> the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be 
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include <drm/drm_mm.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/rbtree.h>
>>> +#include <linux/spinlock.h>
>>> +#include <linux/types.h>
>>> +
>>> +struct drm_gpuva_region;
>>> +struct drm_gpuva;
>>> +struct drm_gpuva_ops;
>>> +
>>> +/**
>>> + * struct drm_gpuva_manager - DRM GPU VA Manager
>>> + *
>>> + * The DRM GPU VA Manager keeps track of a GPU's virtual address 
>>> space by using
>>> + * the &drm_mm range allocator. Typically, this structure is 
>>> embedded in bigger
>>> + * driver structures.
>>> + *
>>> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. 
>>> bytes or
>>> + * pages.
>>> + *
>>> + * There should be one manager instance per GPU virtual address space.
>>> + */
>>> +struct drm_gpuva_manager {
>>> +    /**
>>> +     * @name: the name of the DRM GPU VA space
>>> +     */
>>> +    const char *name;
>>> +
>>> +    /**
>>> +     * @mm_start: start of the VA space
>>> +     */
>>> +    u64 mm_start;
>>> +
>>> +    /**
>>> +     * @mm_range: length of the VA space
>>> +     */
>>> +    u64 mm_range;
>>> +
>>> +    /**
>>> +     * @region_mm: the &drm_mm range allocator to track GPU VA regions
>>> +     */
>>> +    struct drm_mm region_mm;
>>> +
>>> +    /**
>>> +     * @va_mm: the &drm_mm range allocator to track GPU VA mappings
>>> +     */
>>> +    struct drm_mm va_mm;
>>> +
>>> +    /**
>>> +     * @kernel_alloc_node:
>>> +     *
>>> +     * &drm_mm_node representing the address space cutout reserved for
>>> +     * the kernel
>>> +     */
>>> +    struct drm_mm_node kernel_alloc_node;
>>> +};
>>> +
>>> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>>> +                const char *name,
>>> +                u64 start_offset, u64 range,
>>> +                u64 reserve_offset, u64 reserve_range);
>>> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
>>> +
>>> +/**
>>> + * struct drm_gpuva_region - structure to track a portion of GPU VA 
>>> space
>>> + *
>>> + * This structure represents a portion of a GPUs VA space and is 
>>> associated
>>> + * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>>> + *
>>> + * GPU VA mappings, represented by &drm_gpuva objects, are 
>>> restricted to be
>>> + * placed within a &drm_gpuva_region.
>>> + */
>>> +struct drm_gpuva_region {
>>> +    /**
>>> +     * @node: the &drm_mm_node to track the GPU VA region
>>> +     */
>>> +    struct drm_mm_node node;
>>> +
>>> +    /**
>>> +     * @mgr: the &drm_gpuva_manager this object is associated with
>>> +     */
>>> +    struct drm_gpuva_manager *mgr;
>>> +
>>> +    /**
>>> +     * @sparse: indicates whether this region is sparse
>>> +     */
>>> +    bool sparse;
>>> +};
>>> +
>>> +struct drm_gpuva_region *
>>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>>> +              u64 addr, u64 range);
>>> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>>> +                struct drm_gpuva_region *reg,
>>> +                u64 addr, u64 range);
>>> +void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>>> +                  struct drm_gpuva_region *reg);
>>> +
>>> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>>> +             struct drm_gpuva *va,
>>> +             u64 addr, u64 range);
>>> +/**
>>> + * drm_gpuva_for_each_region_in_range - iternator to walk over a 
>>> range of nodes
>>> + * @node__: &drm_gpuva_region structure to assign to in each 
>>> iteration step
>>> + * @gpuva__: &drm_gpuva_manager structure to walk
>>> + * @start__: starting offset, the first node will overlap this
>>> + * @end__: ending offset, the last node will start before this (but 
>>> may overlap)
>>> + *
>>> + * This iterator walks over all nodes in the range allocator that lie
>>> + * between @start and @end. It is implemented similarly to 
>>> list_for_each(),
>>> + * but is using &drm_mm's internal interval tree to accelerate the 
>>> search for
>>> + * the starting node, and hence isn't safe against removal of 
>>> elements. It
>>> + * assumes that @end is within (or is the upper limit of) the 
>>> &drm_gpuva_manager.
>>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, 
>>> the
>>> + * iterator may walk over the special _unallocated_ 
>>> &drm_mm.head_node of the
>>> + * backing &drm_mm, and may even continue indefinitely.
>>> + */
>>> +#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, 
>>> end__) \
>>> +    for (node__ = (struct drm_gpuva_region 
>>> *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
>>> +                                     (start__), (end__)-1); \
>>> +         node__->node.start < (end__); \
>>> +         node__ = (struct drm_gpuva_region 
>>> *)list_next_entry(&node__->node, node_list))
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_region - iternator to walk over a range of nodes
>>> + * @entry: &drm_gpuva_region structure to assign to in each 
>>> iteration step
>>> + * @gpuva: &drm_gpuva_manager structure to walk
>>> + *
>>> + * This iterator walks over all &drm_gpuva_region structures 
>>> associated with the
>>> + * &drm_gpuva_manager.
>>> + */
>>> +#define drm_gpuva_for_each_region(entry, gpuva) \
>>> +    list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), 
>>> node.node_list)
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_region_safe - iternator to safely walk over a 
>>> range of
>>> + * nodes
>>> + * @entry: &drm_gpuva_region structure to assign to in each 
>>> iteration step
>>> + * @next: &next &drm_gpuva_region to store the next step
>>> + * @gpuva: &drm_gpuva_manager structure to walk
>>> + *
>>> + * This iterator walks over all &drm_gpuva_region structures 
>>> associated with the
>>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), 
>>> so save
>>> + * against removal of elements.
>>> + */
>>> +#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
>>> +    list_for_each_entry_safe(entry, next, 
>>> drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>>> +
>>> +
>>> +/**
>>> + * enum drm_gpuva_flags - flags for struct drm_gpuva
>>> + */
>>> +enum drm_gpuva_flags {
>>> +    /**
>>> +     * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is 
>>> swapped
>>> +     */
>>> +    DRM_GPUVA_SWAPPED = (1 << 0),
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva - structure to track a GPU VA mapping
>>> + *
>>> + * This structure represents a GPU VA mapping and is associated with a
>>> + * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>>> + *
>>> + * Typically, this structure is embedded in bigger driver structures.
>>> + */
>>> +struct drm_gpuva {
>>> +    /**
>>> +     * @node: the &drm_mm_node to track the GPU VA mapping
>>> +     */
>>> +    struct drm_mm_node node;
>>> +
>>> +    /**
>>> +     * @mgr: the &drm_gpuva_manager this object is associated with
>>> +     */
>>> +    struct drm_gpuva_manager *mgr;
>>> +
>>> +    /**
>>> +     * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
>>> +     */
>>> +    struct drm_gpuva_region *region;
>>> +
>>> +    /**
>>> +     * @head: the &list_head to attach this object to a &drm_gem_object
>>> +     */
>>> +    struct list_head head;
>>> +
>>> +    /**
>>> +     * @flags: the &drm_gpuva_flags for this mapping
>>> +     */
>>> +    enum drm_gpuva_flags flags;
>>> +
>>> +    /**
>>> +     * @gem: structure containing the &drm_gem_object and it's offset
>>> +     */
>>> +    struct {
>>> +        /**
>>> +         * @offset: the offset within the &drm_gem_object
>>> +         */
>>> +        u64 offset;
>>> +
>>> +        /**
>>> +         * @obj: the mapped &drm_gem_object
>>> +         */
>>> +        struct drm_gem_object *obj;
>>> +    } gem;
>>> +};
>>> +
>>> +void drm_gpuva_link_locked(struct drm_gpuva *va);
>>> +void drm_gpuva_link_unlocked(struct drm_gpuva *va);
>>> +void drm_gpuva_unlink_locked(struct drm_gpuva *va);
>>> +void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
>>> +
>>> +void drm_gpuva_destroy_locked(struct drm_gpuva *va);
>>> +void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
>>> +
>>> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
>>> +                 u64 addr, u64 range);
>>> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, 
>>> u64 start);
>>> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, 
>>> u64 end);
>>> +
>>> +/**
>>> + * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva 
>>> is swapped
>>> + * @va: the &drm_gpuva to set the swap flag of
>>> + * @swap: indicates whether the &drm_gpuva is swapped
>>> + */
>>> +static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
>>> +{
>>> +    if (swap)
>>> +        va->flags |= DRM_GPUVA_SWAPPED;
>>> +    else
>>> +        va->flags &= ~DRM_GPUVA_SWAPPED;
>>> +}
>>> +
>>> +/**
>>> + * drm_gpuva_swapped - indicates whether the backing BO of this 
>>> &drm_gpuva
>>> + * is swapped
>>> + * @va: the &drm_gpuva to check
>>> + */
>>> +static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
>>> +{
>>> +    return va->flags & DRM_GPUVA_SWAPPED;
>>> +}
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_va_in_range - iternator to walk over a range 
>>> of nodes
>>> + * @node__: &drm_gpuva structure to assign to in each iteration step
>>> + * @gpuva__: &drm_gpuva_manager structure to walk
>>> + * @start__: starting offset, the first node will overlap this
>>> + * @end__: ending offset, the last node will start before this (but 
>>> may overlap)
>>> + *
>>> + * This iterator walks over all nodes in the range allocator that lie
>>> + * between @start and @end. It is implemented similarly to 
>>> list_for_each(),
>>> + * but is using &drm_mm's internal interval tree to accelerate the 
>>> search for
>>> + * the starting node, and hence isn't safe against removal of 
>>> elements. It
>>> + * assumes that @end is within (or is the upper limit of) the 
>>> &drm_gpuva_manager.
>>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, 
>>> the
>>> + * iterator may walk over the special _unallocated_ 
>>> &drm_mm.head_node of the
>>> + * backing &drm_mm, and may even continue indefinitely.
>>> + */
>>> +#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, 
>>> end__) \
>>> +    for (node__ = (struct drm_gpuva 
>>> *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
>>> +                                  (start__), (end__)-1); \
>>> +         node__->node.start < (end__); \
>>> +         node__ = (struct drm_gpuva *)list_next_entry(&node__->node, 
>>> node_list))
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_va - iternator to walk over a range of nodes
>>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>>> + * @gpuva: &drm_gpuva_manager structure to walk
>>> + *
>>> + * This iterator walks over all &drm_gpuva structures associated 
>>> with the
>>> + * &drm_gpuva_manager.
>>> + */
>>> +#define drm_gpuva_for_each_va(entry, gpuva) \
>>> +    list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), 
>>> node.node_list)
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_va_safe - iternator to safely walk over a 
>>> range of
>>> + * nodes
>>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>>> + * @next: &next &drm_gpuva to store the next step
>>> + * @gpuva: &drm_gpuva_manager structure to walk
>>> + *
>>> + * This iterator walks over all &drm_gpuva structures associated 
>>> with the
>>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), 
>>> so save
>>> + * against removal of elements.
>>> + */
>>> +#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
>>> +    list_for_each_entry_safe(entry, next, 
>>> drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>>> +
>>> +/**
>>> + * enum drm_gpuva_op_type - GPU VA operation type
>>> + *
>>> + * Operations to alter the GPU VA mappings tracked by the 
>>> &drm_gpuva_manager
>>> + * can be map, remap or unmap operations.
>>> + */
>>> +enum drm_gpuva_op_type {
>>> +    /**
>>> +     * @DRM_GPUVA_OP_MAP: the map op type
>>> +     */
>>> +    DRM_GPUVA_OP_MAP,
>>> +
>>> +    /**
>>> +     * @DRM_GPUVA_OP_REMAP: the remap op type
>>> +     */
>>> +    DRM_GPUVA_OP_REMAP,
>>> +
>>> +    /**
>>> +     * @DRM_GPUVA_OP_UNMAP: the unmap op type
>>> +     */
>>> +    DRM_GPUVA_OP_UNMAP,
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva_op_map - GPU VA map operation
>>> + *
>>> + * This structure represents a single map operation generated by the
>>> + * DRM GPU VA manager.
>>> + */
>>> +struct drm_gpuva_op_map {
>>> +    /**
>>> +     * @va: structure containing address and range of a map
>>> +     * operation
>>> +     */
>>> +    struct {
>>> +        /**
>>> +         * @addr: the base address of the new mapping
>>> +         */
>>> +        u64 addr;
>>> +
>>> +        /**
>>> +         * @range: the range of the new mapping
>>> +         */
>>> +        u64 range;
>>> +    } va;
>>> +
>>> +    /**
>>> +     * @gem: structure containing the &drm_gem_object and it's offset
>>> +     */
>>> +    struct {
>>> +        /**
>>> +         * @offset: the offset within the &drm_gem_object
>>> +         */
>>> +        u64 offset;
>>> +
>>> +        /**
>>> +         * @obj: the &drm_gem_object to map
>>> +         */
>>> +        struct drm_gem_object *obj;
>>> +    } gem;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
>>> + *
>>> + * This structure represents a single unmap operation generated by the
>>> + * DRM GPU VA manager.
>>> + */
>>> +struct drm_gpuva_op_unmap {
>>> +    /**
>>> +     * @va: the &drm_gpuva to unmap
>>> +     */
>>> +    struct drm_gpuva *va;
>>> +
>>> +    /**
>>> +     * @keep:
>>> +     *
>>> +     * Indicates whether this &drm_gpuva is physically contiguous 
>>> with the
>>> +     * original mapping request.
>>> +     *
>>> +     * Optionally, if &keep is set, drivers may keep the actual page 
>>> table
>>> +     * mappings for this &drm_gpuva, adding the missing page table 
>>> entries
>>> +     * only and update the &drm_gpuva_manager accordingly.
>>> +     */
>>> +    bool keep;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva_op_remap - GPU VA remap operation
>>> + *
>>> + * This represents a single remap operation generated by the DRM GPU 
>>> VA manager.
>>> + *
>>> + * A remap operation is generated when an existing GPU VA mmapping 
>>> is split up
>>> + * by inserting a new GPU VA mapping or by partially unmapping existent
>>> + * mapping(s), hence it consists of a maximum of two map and one unmap
>>> + * operation.
>>> + *
>>> + * The @unmap operation takes care of removing the original existing 
>>> mapping.
>>> + * @prev is used to remap the preceding part, @next the subsequent 
>>> part.
>>> + *
>>> + * If either a new mapping's start address is aligned with the start 
>>> address
>>> + * of the old mapping or the new mapping's end address is aligned 
>>> with the
>>> + * end address of the old mapping, either @prev or @next is NULL.
>>> + *
>>> + * Note, the reason for a dedicated remap operation, rather than 
>>> arbitrary
>>> + * unmap and map operations, is to give drivers the chance of 
>>> extracting driver
>>> + * specific data for creating the new mappings from the unmap 
>>> operations's
>>> + * &drm_gpuva structure which typically is embedded in larger driver 
>>> specific
>>> + * structures.
>>> + */
>>> +struct drm_gpuva_op_remap {
>>> +    /**
>>> +     * @prev: the preceding part of a split mapping
>>> +     */
>>> +    struct drm_gpuva_op_map *prev;
>>> +
>>> +    /**
>>> +     * @next: the subsequent part of a split mapping
>>> +     */
>>> +    struct drm_gpuva_op_map *next;
>>> +
>>> +    /**
>>> +     * @unmap: the unmap operation for the original existing mapping
>>> +     */
>>> +    struct drm_gpuva_op_unmap *unmap;
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva_op - GPU VA operation
>>> + *
>>> + * This structure represents a single generic operation, which can 
>>> be either
>>> + * map, unmap or remap.
>>> + *
>>> + * The particular type of the operation is defined by @op.
>>> + */
>>> +struct drm_gpuva_op {
>>> +    /**
>>> +     * @entry:
>>> +     *
>>> +     * The &list_head used to distribute instances of this struct 
>>> within
>>> +     * &drm_gpuva_ops.
>>> +     */
>>> +    struct list_head entry;
>>> +
>>> +    /**
>>> +     * @op: the type of the operation
>>> +     */
>>> +    enum drm_gpuva_op_type op;
>>> +
>>> +    union {
>>> +        /**
>>> +         * @map: the map operation
>>> +         */
>>> +        struct drm_gpuva_op_map map;
>>> +
>>> +        /**
>>> +         * @unmap: the unmap operation
>>> +         */
>>> +        struct drm_gpuva_op_unmap unmap;
>>> +
>>> +        /**
>>> +         * @remap: the remap operation
>>> +         */
>>> +        struct drm_gpuva_op_remap remap;
>>> +    };
>>> +};
>>> +
>>> +/**
>>> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
>>> + */
>>> +struct drm_gpuva_ops {
>>> +    /**
>>> +     * @list: the &list_head
>>> +     */
>>> +    struct list_head list;
>>> +};
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_op - iterator to walk over all ops
>>> + * @op: &drm_gpuva_op to assign in each iteration step
>>> + * @ops: &drm_gpuva_ops to walk
>>> + *
>>> + * This iterator walks over all ops within a given list of operations.
>>> + */
>>> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, 
>>> &(ops)->list, entry)
>>> +
>>> +/**
>>> + * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
>>> + * @op: &drm_gpuva_op to assign in each iteration step
>>> + * @next: &next &drm_gpuva_op to store the next step
>>> + * @ops: &drm_gpuva_ops to walk
>>> + *
>>> + * This iterator walks over all ops within a given list of 
>>> operations. It is
>>> + * implemented with list_for_each_safe(), so save against removal of 
>>> elements.
>>> + */
>>> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
>>> +    list_for_each_entry_safe(op, next, &(ops)->list, entry)
>>> +
>>> +struct drm_gpuva_ops *
>>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>>> +                u64 addr, u64 range,
>>> +                struct drm_gem_object *obj, u64 offset);
>>> +struct drm_gpuva_ops *
>>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>>> +                  u64 addr, u64 range);
>>> +void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
>>> +
>>> +#endif /* __DRM_GPUVA_MGR_H__ */
>>> -- 
>>> 2.39.0
>>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-01-19  5:23                     ` Matthew Brost
  2023-01-19 11:33                       ` drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI) Christian König
@ 2023-02-06 14:48                       ` Oded Gabbay
  2023-03-16 16:39                         ` Danilo Krummrich
  1 sibling, 1 reply; 75+ messages in thread
From: Oded Gabbay @ 2023-02-06 14:48 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Danilo Krummrich, tzimmermann, linux-doc, nouveau, corbet,
	linux-kernel, dri-devel, bskeggs, jason, airlied,
	Christian König

On Thu, Jan 19, 2023 at 7:24 AM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Thu, Jan 19, 2023 at 05:04:32AM +0100, Danilo Krummrich wrote:
> > On 1/18/23 20:48, Christian König wrote:
> > > Am 18.01.23 um 20:17 schrieb Dave Airlie:
> > > > On Thu, 19 Jan 2023 at 02:54, Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > > On Wed, Jan 18, 2023 at 11:50 AM Danilo Krummrich
> > > > > <dakr@redhat.com> wrote:
> > > > > >
> > > > > >
> > > > > > On 1/18/23 17:30, Alex Deucher wrote:
> > > > > > > On Wed, Jan 18, 2023 at 11:19 AM Danilo Krummrich
> > > > > > > <dakr@redhat.com> wrote:
> > > > > > > > On 1/18/23 16:37, Christian König wrote:
> > > > > > > > > Am 18.01.23 um 16:34 schrieb Danilo Krummrich:
> > > > > > > > > > Hi Christian,
> > > > > > > > > >
> > > > > > > > > > On 1/18/23 09:53, Christian König wrote:
> > > > > > > > > > > Am 18.01.23 um 07:12 schrieb Danilo Krummrich:
> > > > > > > > > > > > This patch series provides a new UAPI for the Nouveau driver in
> > > > > > > > > > > > order to
> > > > > > > > > > > > support Vulkan features, such as
> > > > > > > > > > > > sparse bindings and sparse
> > > > > > > > > > > > residency.
> > > > > > > > > > > >
> > > > > > > > > > > > Furthermore, with the DRM GPUVA
> > > > > > > > > > > > manager it provides a new DRM core
> > > > > > > > > > > > feature to
> > > > > > > > > > > > keep track of GPU virtual address
> > > > > > > > > > > > (VA) mappings in a more generic way.
> > > > > > > > > > > >
> > > > > > > > > > > > The DRM GPUVA manager is indented to help drivers implement
> > > > > > > > > > > > userspace-manageable
> > > > > > > > > > > > GPU VA spaces in reference to the Vulkan API. In order to achieve
> > > > > > > > > > > > this goal it
> > > > > > > > > > > > serves the following purposes in this context.
> > > > > > > > > > > >
> > > > > > > > > > > >        1) Provide a dedicated range allocator to track GPU VA
> > > > > > > > > > > > allocations and
> > > > > > > > > > > >           mappings, making use of the drm_mm range allocator.
> > > > > > > > > > > This means that the ranges are allocated
> > > > > > > > > > > by the kernel? If yes that's
> > > > > > > > > > > a really really bad idea.
> > > > > > > > > > No, it's just for keeping track of the
> > > > > > > > > > ranges userspace has allocated.
> > > > > > > > > Ok, that makes more sense.
> > > > > > > > >
> > > > > > > > > So basically you have an IOCTL which asks kernel
> > > > > > > > > for a free range? Or
> > > > > > > > > what exactly is the drm_mm used for here?
> > > > > > > > Not even that, userspace provides both the base
> > > > > > > > address and the range,
> > > > > > > > the kernel really just keeps track of things.
> > > > > > > > Though, writing a UAPI on
> > > > > > > > top of the GPUVA manager asking for a free range instead would be
> > > > > > > > possible by just adding the corresponding wrapper functions to get a
> > > > > > > > free hole.
> > > > > > > >
> > > > > > > > Currently, and that's what I think I read out of
> > > > > > > > your question, the main
> > > > > > > > benefit of using drm_mm over simply stuffing the
> > > > > > > > entries into a list or
> > > > > > > > something boils down to easier collision detection and iterating
> > > > > > > > sub-ranges of the whole VA space.
> > > > > > > Why not just do this in userspace?  We have a range manager in
> > > > > > > libdrm_amdgpu that you could lift out into libdrm or some other
> > > > > > > helper.
> > > > > > The kernel still needs to keep track of the mappings within the various
> > > > > > VA spaces, e.g. it silently needs to unmap mappings that are backed by
> > > > > > BOs that get evicted and remap them once they're validated (or swapped
> > > > > > back in).
> > > > > Ok, you are just using this for maintaining the GPU VM space in
> > > > > the kernel.
> > > > >
> > > > Yes the idea behind having common code wrapping drm_mm for this is to
> > > > allow us to make the rules consistent across drivers.
> > > >
> > > > Userspace (generally Vulkan, some compute) has interfaces that pretty
> > > > much dictate a lot of how VMA tracking works, esp around lifetimes,
> > > > sparse mappings and splitting/merging underlying page tables, I'd
> > > > really like this to be more consistent across drivers, because already
> > > > I think we've seen with freedreno some divergence from amdgpu and we
> > > > also have i915/xe to deal with. I'd like to at least have one place
> > > > that we can say this is how it should work, since this is something
> > > > that *should* be consistent across drivers mostly, as it is more about
> > > > how the uapi is exposed.
> > >
> > > That's a really good idea, but the implementation with drm_mm won't work
> > > like that.
> > >
> > > We have Vulkan applications which use the sparse feature to create
> > > literally millions of mappings. That's why I have fine tuned the mapping
>
> Is this not an application issue? Millions of mappings seems a bit
> absurd to me.
If I look at the most extreme case for AI, assuming 256GB of HBM
memory and page mapping of 2MB, we get to 128K of mappings. But that's
really the extreme case imo. I assume most mappings will be much
larger. In fact, in the most realistic scenario of large-scale
training, a single user will probably map the entire HBM memory using
1GB pages.

I have also a question, could this GPUVA code manage VA ranges
mappings for userptr mappings, assuming we work without svm/uva/usm
(pointer-is-a-pointer) ? Because then we are talking about possible
4KB mappings of 1 - 1.5 TB host server RAM (Implied in my question is
the assumption this can be used also for non-VK use-cases. Please tell
me if I'm totally wrong here).

Thanks,
Oded

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-06 13:27                               ` Danilo Krummrich
@ 2023-02-06 16:14                                 ` Christian König
  2023-02-06 18:20                                   ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-06 16:14 UTC (permalink / raw)
  To: Danilo Krummrich, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

Concentrating this discussion on a very big misunderstanding first.

Am 06.02.23 um 14:27 schrieb Danilo Krummrich:
> [SNIP]
> My understanding is that userspace is fully responsible on the parts 
> of the GPU VA space it owns. This means that userspace needs to take 
> care to *not* ask the kernel to modify mappings that are in use currently.

This is a completely wrong assumption! Take a look at what games like 
Forza Horizzon are doing.

Basically that game allocates a very big sparse area and fills it with 
pages from BOs while shaders are accessing it. And yes, as far as I know 
this is completely valid behavior.

So you need to be able to handle this case anyway and the approach with 
the regions won't help you at all preventing that.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-06 16:14                                 ` Christian König
@ 2023-02-06 18:20                                   ` Danilo Krummrich
  2023-02-07  9:35                                     ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-06 18:20 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

On 2/6/23 17:14, Christian König wrote:
> Concentrating this discussion on a very big misunderstanding first.
> 
> Am 06.02.23 um 14:27 schrieb Danilo Krummrich:
>> [SNIP]
>> My understanding is that userspace is fully responsible on the parts 
>> of the GPU VA space it owns. This means that userspace needs to take 
>> care to *not* ask the kernel to modify mappings that are in use 
>> currently.
> 
> This is a completely wrong assumption! Take a look at what games like 
> Forza Horizzon are doing.
> 
> Basically that game allocates a very big sparse area and fills it with 
> pages from BOs while shaders are accessing it. And yes, as far as I know 
> this is completely valid behavior.

I also think this is valid behavior. That's not the problem I'm trying 
to describe. In this case userspace modifies the VA space 
*intentionally* while shaders are accessing it, because it knows that 
the shaders can deal with reading 0s.

Just to have it all in place, the example I gave was:
  - two virtually contiguous buffers A and B
  - binding 1 mapped to A with BO offset 0
  - binding 2 mapped to B with BO offset length(A)

What I did not mention both A and B aren't sparse buffers in this 
example, although it probably doesn't matter too much.

Since the conditions to do so are given, we merge binding 1 and binding 
2 right at the time when binding 2 is requested. To do so a driver might 
unmap binding 1 for a very short period of time (e.g. to (re-)map the 
freshly merged binding with a different page size if possible).

 From userspace perspective buffer A is ready to use before applying 
binding 2 to buffer B, hence it would be illegal to touch binding 1 
again when userspace asks the kernel to map binding 2 to buffer B.

Besides that I think there is no point in merging between buffers anyway 
because we'd end up splitting such a merged mapping anyway later on when 
one of the two buffers is destroyed.

Also, I think the same applies to sparse buffers as well, a mapping 
within A isn't expected to be re-mapped just because something is mapped 
to B.

However, in this context I start wondering if re-mapping in the context 
of merge and split is allowed at all, even within the same sparse buffer 
(and even with a separate page table for sparse mappings as described in 
my last mail; shaders would never fault).

> 
> So you need to be able to handle this case anyway and the approach with 
> the regions won't help you at all preventing that.
> 
> Regards,
> Christian.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-06 18:20                                   ` Danilo Krummrich
@ 2023-02-07  9:35                                     ` Christian König
  2023-02-07 10:50                                       ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-07  9:35 UTC (permalink / raw)
  To: Danilo Krummrich, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

Am 06.02.23 um 19:20 schrieb Danilo Krummrich:
> On 2/6/23 17:14, Christian König wrote:
>> Concentrating this discussion on a very big misunderstanding first.
>>
>> Am 06.02.23 um 14:27 schrieb Danilo Krummrich:
>>> [SNIP]
>>> My understanding is that userspace is fully responsible on the parts 
>>> of the GPU VA space it owns. This means that userspace needs to take 
>>> care to *not* ask the kernel to modify mappings that are in use 
>>> currently.
>>
>> This is a completely wrong assumption! Take a look at what games like 
>> Forza Horizzon are doing.
>>
>> Basically that game allocates a very big sparse area and fills it 
>> with pages from BOs while shaders are accessing it. And yes, as far 
>> as I know this is completely valid behavior.
>
> I also think this is valid behavior. That's not the problem I'm trying 
> to describe. In this case userspace modifies the VA space 
> *intentionally* while shaders are accessing it, because it knows that 
> the shaders can deal with reading 0s.

No, it's perfectly valid for userspace to modify the VA space even if 
shaders are not supposed to deal with reading 0s.

>
>
> Just to have it all in place, the example I gave was:
>  - two virtually contiguous buffers A and B
>  - binding 1 mapped to A with BO offset 0
>  - binding 2 mapped to B with BO offset length(A)
>
> What I did not mention both A and B aren't sparse buffers in this 
> example, although it probably doesn't matter too much.
>
> Since the conditions to do so are given, we merge binding 1 and 
> binding 2 right at the time when binding 2 is requested. To do so a 
> driver might unmap binding 1 for a very short period of time (e.g. to 
> (re-)map the freshly merged binding with a different page size if 
> possible).

Nope, that's not correct handling.

>
> From userspace perspective buffer A is ready to use before applying 
> binding 2 to buffer B, hence it would be illegal to touch binding 1 
> again when userspace asks the kernel to map binding 2 to buffer B.
>
> Besides that I think there is no point in merging between buffers 
> anyway because we'd end up splitting such a merged mapping anyway 
> later on when one of the two buffers is destroyed.
>
> Also, I think the same applies to sparse buffers as well, a mapping 
> within A isn't expected to be re-mapped just because something is 
> mapped to B.
>
> However, in this context I start wondering if re-mapping in the 
> context of merge and split is allowed at all, even within the same 
> sparse buffer (and even with a separate page table for sparse mappings 
> as described in my last mail; shaders would never fault).

See, your assumption is that userspace/applications don't modify the VA 
space intentionally while the GPU is accessing it is just bluntly 
speaking incorrect.

When you have a VA address which is mapped to buffer A and accessed by 
some GPU shaders it is perfectly valid for the application to say "map 
it again to the same buffer A".

It is also perfectly valid for an application to re-map this region to a 
different buffer B, it's just not defined when the access then transits 
from A to B. (AFAIK this is currently worked on in a new specification).

So when your page table updates result in the shader to intermediately 
get 0s in return, because you change the underlying mapping you simply 
have some implementation bug in Nouveau.

I don't know how Nvidia hw handles this, and yes it's quite complicated 
on AMD hw as well because our TLBs are not really made for this use 
case, but I'm 100% sure that this is possible since it is still part of 
some of the specifications (mostly Vulkan I think).

To sum it up as far as I can see by giving the regions to the kernel is 
not something you would want for Nouveau either.

Regards,
Christian.


>
>>
>> So you need to be able to handle this case anyway and the approach 
>> with the regions won't help you at all preventing that.
>>
>> Regards,
>> Christian.
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-07  9:35                                     ` Christian König
@ 2023-02-07 10:50                                       ` Danilo Krummrich
  2023-02-10 11:50                                         ` Christian König
  0 siblings, 1 reply; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-07 10:50 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

On 2/7/23 10:35, Christian König wrote:
> Am 06.02.23 um 19:20 schrieb Danilo Krummrich:
>> On 2/6/23 17:14, Christian König wrote:
>>> Concentrating this discussion on a very big misunderstanding first.
>>>
>>> Am 06.02.23 um 14:27 schrieb Danilo Krummrich:
>>>> [SNIP]
>>>> My understanding is that userspace is fully responsible on the parts 
>>>> of the GPU VA space it owns. This means that userspace needs to take 
>>>> care to *not* ask the kernel to modify mappings that are in use 
>>>> currently.
>>>
>>> This is a completely wrong assumption! Take a look at what games like 
>>> Forza Horizzon are doing.
>>>
>>> Basically that game allocates a very big sparse area and fills it 
>>> with pages from BOs while shaders are accessing it. And yes, as far 
>>> as I know this is completely valid behavior.
>>
>> I also think this is valid behavior. That's not the problem I'm trying 
>> to describe. In this case userspace modifies the VA space 
>> *intentionally* while shaders are accessing it, because it knows that 
>> the shaders can deal with reading 0s.
> 
> No, it's perfectly valid for userspace to modify the VA space even if 
> shaders are not supposed to deal with reading 0s.
> 
>>
>>
>> Just to have it all in place, the example I gave was:
>>  - two virtually contiguous buffers A and B
>>  - binding 1 mapped to A with BO offset 0
>>  - binding 2 mapped to B with BO offset length(A)
>>
>> What I did not mention both A and B aren't sparse buffers in this 
>> example, although it probably doesn't matter too much.
>>
>> Since the conditions to do so are given, we merge binding 1 and 
>> binding 2 right at the time when binding 2 is requested. To do so a 
>> driver might unmap binding 1 for a very short period of time (e.g. to 
>> (re-)map the freshly merged binding with a different page size if 
>> possible).
> 
> Nope, that's not correct handling.

I agree, and that's exactly what I'm trying to say. However, I start 
noticing that this is not correct if it happens within the same buffer 
as well.

> 
>>
>> From userspace perspective buffer A is ready to use before applying 
>> binding 2 to buffer B, hence it would be illegal to touch binding 1 
>> again when userspace asks the kernel to map binding 2 to buffer B.
>>
>> Besides that I think there is no point in merging between buffers 
>> anyway because we'd end up splitting such a merged mapping anyway 
>> later on when one of the two buffers is destroyed.
>>
>> Also, I think the same applies to sparse buffers as well, a mapping 
>> within A isn't expected to be re-mapped just because something is 
>> mapped to B.
>>
>> However, in this context I start wondering if re-mapping in the 
>> context of merge and split is allowed at all, even within the same 
>> sparse buffer (and even with a separate page table for sparse mappings 
>> as described in my last mail; shaders would never fault).
> 
> See, your assumption is that userspace/applications don't modify the VA 
> space intentionally while the GPU is accessing it is just bluntly 
> speaking incorrect.
> 

I don't assume that. The opposite is the case. My assumption is that 
it's always OK for userspace to intentionally modify the VA space.

However, I also assumed that if userspace asks for e.g. a new mapping 
within a certain buffer it is OK for the kernel to apply further changes 
(e.g. re-organize PTs to split or merge) to the VA space of which 
userspace isn't aware of. At least as long as they happen within the 
bounds of this particular buffer, but not for other buffers.

I think the reasoning I had in mind was that I thought if userspace asks 
for any modification of a given portion of the VA space (that is a 
VKBuffer) userspace must assume that until this modification (e.g. 
re-organization of PTs) is complete reading 0s intermediately may 
happen. This seems to be clearly wrong.

> When you have a VA address which is mapped to buffer A and accessed by 
> some GPU shaders it is perfectly valid for the application to say "map 
> it again to the same buffer A".
> 
> It is also perfectly valid for an application to re-map this region to a 
> different buffer B, it's just not defined when the access then transits 
> from A to B. (AFAIK this is currently worked on in a new specification).
> 
> So when your page table updates result in the shader to intermediately 
> get 0s in return, because you change the underlying mapping you simply 
> have some implementation bug in Nouveau.

Luckily that's not the case (anymore).

> 
> I don't know how Nvidia hw handles this, and yes it's quite complicated 
> on AMD hw as well because our TLBs are not really made for this use 
> case, but I'm 100% sure that this is possible since it is still part of 
> some of the specifications (mostly Vulkan I think).
> 
> To sum it up as far as I can see by giving the regions to the kernel is 
> not something you would want for Nouveau either.

If, as it turns out, it's also not allowed to do what I described above 
within the same VKBuffer, I agree the bounds aren't needed for merging.

However, I still don't see why we would want to merge over buffer 
boundaries, because ultimately we'll end up splitting such a merged 
mapping later on anyway once one of the buffers is destroyed.

Also, as explained in one of the previous mails in nouveau we can have 
separate PTs for sparse mappings with large page sizes and separate PTs 
for memory backed mappings with smaller page sizes overlaying them. 
Hence, I need to track a single sparse mapping per buffer spanning the 
whole buffer (which I do with a region) and the actual memory backed 
mappings within the same range.

Now, this might or might not be unique for Nvidia hardware. If nouveau 
would be the only potential user, plus we don't care about potentially 
merging mappings over buffer boundaries and hence producing foreseeable 
splits of those merged mappings, we could get rid of regions entirely.

> 
> Regards,
> Christian.
> 
> 
>>
>>>
>>> So you need to be able to handle this case anyway and the approach 
>>> with the regions won't help you at all preventing that.
>>>
>>> Regards,
>>> Christian.
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-07 10:50                                       ` Danilo Krummrich
@ 2023-02-10 11:50                                         ` Christian König
  2023-02-10 12:47                                           ` Danilo Krummrich
  0 siblings, 1 reply; 75+ messages in thread
From: Christian König @ 2023-02-10 11:50 UTC (permalink / raw)
  To: Danilo Krummrich, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied



Am 07.02.23 um 11:50 schrieb Danilo Krummrich:
> On 2/7/23 10:35, Christian König wrote:
[SNIP]
>>>
>>> Just to have it all in place, the example I gave was:
>>>  - two virtually contiguous buffers A and B
>>>  - binding 1 mapped to A with BO offset 0
>>>  - binding 2 mapped to B with BO offset length(A)
>>>
>>> What I did not mention both A and B aren't sparse buffers in this 
>>> example, although it probably doesn't matter too much.
>>>
>>> Since the conditions to do so are given, we merge binding 1 and 
>>> binding 2 right at the time when binding 2 is requested. To do so a 
>>> driver might unmap binding 1 for a very short period of time (e.g. 
>>> to (re-)map the freshly merged binding with a different page size if 
>>> possible).
>>
>> Nope, that's not correct handling.
>
> I agree, and that's exactly what I'm trying to say. However, I start 
> noticing that this is not correct if it happens within the same buffer 
> as well.

Yes, exactly that's my point.

>
>>
>>>
>>> From userspace perspective buffer A is ready to use before applying 
>>> binding 2 to buffer B, hence it would be illegal to touch binding 1 
>>> again when userspace asks the kernel to map binding 2 to buffer B.
>>>
>>> Besides that I think there is no point in merging between buffers 
>>> anyway because we'd end up splitting such a merged mapping anyway 
>>> later on when one of the two buffers is destroyed.
>>>
>>> Also, I think the same applies to sparse buffers as well, a mapping 
>>> within A isn't expected to be re-mapped just because something is 
>>> mapped to B.
>>>
>>> However, in this context I start wondering if re-mapping in the 
>>> context of merge and split is allowed at all, even within the same 
>>> sparse buffer (and even with a separate page table for sparse 
>>> mappings as described in my last mail; shaders would never fault).
>>
>> See, your assumption is that userspace/applications don't modify the 
>> VA space intentionally while the GPU is accessing it is just bluntly 
>> speaking incorrect.
>>
>
> I don't assume that. The opposite is the case. My assumption is that 
> it's always OK for userspace to intentionally modify the VA space.
>
> However, I also assumed that if userspace asks for e.g. a new mapping 
> within a certain buffer it is OK for the kernel to apply further 
> changes (e.g. re-organize PTs to split or merge) to the VA space of 
> which userspace isn't aware of. At least as long as they happen within 
> the bounds of this particular buffer, but not for other buffers.

Well when this somehow affects shaders which accesses other parts of the 
buffer at the same time then that won't work.

> I think the reasoning I had in mind was that I thought if userspace 
> asks for any modification of a given portion of the VA space (that is 
> a VKBuffer) userspace must assume that until this modification (e.g. 
> re-organization of PTs) is complete reading 0s intermediately may 
> happen. This seems to be clearly wrong.
>
>> When you have a VA address which is mapped to buffer A and accessed 
>> by some GPU shaders it is perfectly valid for the application to say 
>> "map it again to the same buffer A".
>>
>> It is also perfectly valid for an application to re-map this region 
>> to a different buffer B, it's just not defined when the access then 
>> transits from A to B. (AFAIK this is currently worked on in a new 
>> specification).
>>
>> So when your page table updates result in the shader to 
>> intermediately get 0s in return, because you change the underlying 
>> mapping you simply have some implementation bug in Nouveau.
>
> Luckily that's not the case (anymore).
>
>>
>> I don't know how Nvidia hw handles this, and yes it's quite 
>> complicated on AMD hw as well because our TLBs are not really made 
>> for this use case, but I'm 100% sure that this is possible since it 
>> is still part of some of the specifications (mostly Vulkan I think).
>>
>> To sum it up as far as I can see by giving the regions to the kernel 
>> is not something you would want for Nouveau either.
>
> If, as it turns out, it's also not allowed to do what I described 
> above within the same VKBuffer, I agree the bounds aren't needed for 
> merging.
>
> However, I still don't see why we would want to merge over buffer 
> boundaries, because ultimately we'll end up splitting such a merged 
> mapping later on anyway once one of the buffers is destroyed.

Well the key point is all approaches have some pros and cons.

If we merge and decide to only do that inside certain boundaries then 
those boundaries needs to be provided and checked against. This burns 
quite some CPU cycles

If we just merge what we can we might have extra page table updates 
which cost time and could result in undesired side effects.

If we don't merge at all we have additional housekeeping for the 
mappings and maybe hw restrictions.

> Also, as explained in one of the previous mails in nouveau we can have 
> separate PTs for sparse mappings with large page sizes and separate 
> PTs for memory backed mappings with smaller page sizes overlaying 
> them. Hence, I need to track a single sparse mapping per buffer 
> spanning the whole buffer (which I do with a region) and the actual 
> memory backed mappings within the same range.
>
> Now, this might or might not be unique for Nvidia hardware. If nouveau 
> would be the only potential user, plus we don't care about potentially 
> merging mappings over buffer boundaries and hence producing 
> foreseeable splits of those merged mappings, we could get rid of 
> regions entirely.

This sounds similar to what AMD hw used to have up until gfx8 (I think), 
basically sparse resources where defined through a separate mechanism to 
the address resolution of the page tables. I won't rule out that other 
hardware has similar approaches.

On the other hand when you have separate page tables for address 
translation and sparse handling then why not instantiate two separate VM 
manager instances for them?

Regards,
Christian.

>
>>
>> Regards,
>> Christian.
>>
>>
>>>
>>>>
>>>> So you need to be able to handle this case anyway and the approach 
>>>> with the regions won't help you at all preventing that.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
  2023-02-10 11:50                                         ` Christian König
@ 2023-02-10 12:47                                           ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-10 12:47 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Matthew Brost, daniel, corbet, dri-devel, linux-doc,
	linux-kernel, mripard, bskeggs, jason, nouveau, airlied

On 2/10/23 12:50, Christian König wrote:
> 
> 
> Am 07.02.23 um 11:50 schrieb Danilo Krummrich:
>> On 2/7/23 10:35, Christian König wrote:

<snip>

>> However, I still don't see why we would want to merge over buffer 
>> boundaries, because ultimately we'll end up splitting such a merged 
>> mapping later on anyway once one of the buffers is destroyed.
> 
> Well the key point is all approaches have some pros and cons.
> 
> If we merge and decide to only do that inside certain boundaries then 
> those boundaries needs to be provided and checked against. This burns 
> quite some CPU cycles
> 
> If we just merge what we can we might have extra page table updates 
> which cost time and could result in undesired side effects.
> 
> If we don't merge at all we have additional housekeeping for the 
> mappings and maybe hw restrictions.

Absolutely agree, hence I think it would be beneficial to leave the 
decision to the driver which approach to pick.

For instance, if a driver needs to keep track of these bounds anyway 
because it needs to track separate page tables for sparse regions, there 
is no additional overhead, but the nice effect of being able not avoid 
unnecessary merges and subsequent splits.

> 
>> Also, as explained in one of the previous mails in nouveau we can have 
>> separate PTs for sparse mappings with large page sizes and separate 
>> PTs for memory backed mappings with smaller page sizes overlaying 
>> them. Hence, I need to track a single sparse mapping per buffer 
>> spanning the whole buffer (which I do with a region) and the actual 
>> memory backed mappings within the same range.
>>
>> Now, this might or might not be unique for Nvidia hardware. If nouveau 
>> would be the only potential user, plus we don't care about potentially 
>> merging mappings over buffer boundaries and hence producing 
>> foreseeable splits of those merged mappings, we could get rid of 
>> regions entirely.
> 
> This sounds similar to what AMD hw used to have up until gfx8 (I think), 
> basically sparse resources where defined through a separate mechanism to 
> the address resolution of the page tables. I won't rule out that other 
> hardware has similar approaches.
> 
> On the other hand when you have separate page tables for address 
> translation and sparse handling then why not instantiate two separate VM 
> manager instances for them?

As mentioned above, for some drivers there could be a synergy between 
keeping track of those separate page tables and using these boundaries 
for merge decisions.

Also, having a separate manager instance would lead to have less 
lightweight nodes for sparse regions, since we'd also carry the fields 
needed for memory backed mappings. Furthermore there wouldn't be a 
"generic relationship" between the nodes of the two separate manager 
instances, like a mapping node has a pointer to the region it resides 
in. This may be useful to e.g. do some sanity checks, unmap all mappings 
of a given region, etc.

Of course drivers could code this relationship within the driver 
specific structures around the mapping nodes, but I think it would be 
nice to generalize that and have this built-in.

> 
> Regards,
> Christian.
> 
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>>
>>>>>
>>>>> So you need to be able to handle this case anyway and the approach 
>>>>> with the regions won't help you at all preventing that.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings
  2023-02-03 17:37   ` Matthew Brost
  2023-02-06 13:35     ` Christian König
@ 2023-02-14 11:52     ` Danilo Krummrich
  1 sibling, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-02-14 11:52 UTC (permalink / raw)
  To: Matthew Brost
  Cc: daniel, airlied, christian.koenig, bskeggs, jason, tzimmermann,
	mripard, corbet, nouveau, linux-kernel, dri-devel, linux-doc

On 2/3/23 18:37, Matthew Brost wrote:
> On Wed, Jan 18, 2023 at 07:12:45AM +0100, Danilo Krummrich wrote:
>> This adds the infrastructure for a manager implementation to keep track
>> of GPU virtual address (VA) mappings.
>>
>> New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
>> start implementing, allow userspace applications to request multiple and
>> arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
>> intended to serve the following purposes in this context.
>>
>> 1) Provide a dedicated range allocator to track GPU VA allocations and
>>     mappings, making use of the drm_mm range allocator.
>>
>> 2) Generically connect GPU VA mappings to their backing buffers, in
>>     particular DRM GEM objects.
>>
>> 3) Provide a common implementation to perform more complex mapping
>>     operations on the GPU VA space. In particular splitting and merging
>>     of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>     unmap requests.
>>
> 
> Over the past week I've hacked together a PoC port of Xe to GPUVA [1], so
> far it seems really promising. 95% of the way to being feature
> equivalent of the current Xe VM bind implementation and have line of
> sight to getting sparse bindings implemented on top of GPUVA too. IMO
> this has basically everything we need for Xe with a few tweaks.
> 
> I am out until 2/14 but wanted to get my thoughts / suggestions out on
> the list before I leave.

Thanks for your work on that!

> 1. GPUVA post didn't support the way Xe does userptrs - a NULL GEM. I
> believe with [2], [3], and [4] GPUVA will support NULL GEMs. Also my
> thinking sparse binds will also have NULL GEMs, more on sparse bindings
> below.
> 
> 2. I agree with Christian that drm_mm probably isn't what we want to
> base the GPUVA implementation on, rather a RB tree or Maple tree has
> been discussed. The implementation should be fairly easy to tune once we
> have benchmarks running so not to concerned here as we can figure this
> out down the line.
> 
> 3. In Xe we want create xe_vm_op list which inherits from drm_gpuva_op
> I've done this with a hack [5], I believe when we rebase we can do this
> with a custom callback to allocate a large op size.
> 
> 4. I'd like add user bits to drm_gpuva_flags like I do in [6]. This is
> similar to DMA_FENCE_FLAG_USER_BITS.
> 
> 5. In Xe we have VM prefetch operation which is needed for our compute
> UMD with page faults. I'd like add prefetch type of operation like we do
> in [7].
> 
> 6. In Xe we have VM unbind all mappings for a GEM IOCTL, I'd like to add
> support to generate this operation list to GPUVA like we do in [8].
> 
> 7. I've thought about how Xe will implement sparse mappings (read 0,
> writes dropped). My current thinking is a sparse mapping will be
> represented as a drm_gpuva rather than region like in Nouveau. Making
> regions optional to me seems likes good idea rather than forcing the
> user of GPUVA code to create 1 large region for the manager as I
> currently do in the Xe PoC.
> 
> 8. Personally I'd like the caller to own the locking for GEM drm_gpuva
> list (drm_gpuva_link_*, drm_gpuva_unlink_* functions). In Xe we almost
> certainly will have the GEM dma-resv lock when we touch this list so an
> extra lock here is redundant. Also it kinda goofy that caller owns the
> for drm_gpuva insertion / removal but not the locking for this list.

You really mean having the dma-resv lock aquired or have a fence on the 
dma_resv obj?

In Nouveau I map/unmap gpuvas in the ttm_device_funcs' move() callback. 
I only validate() and add a dma_resv fence to GEMs of new mappings. For 
unmap / remap operations I just take the GEMs gpuva list lock and check 
whether the GEM is evicted currently. If it is evicted currently (and 
hence unmapped) and I'm on a remap operation I can just do the update on 
the GPUVA space, since latest on the next EXEC ioctl() the corresponding 
GEM is validated and hence re-mapped correctly. If it's an unmap 
operation I just need to remove the GPUVA, since the corresponding 
mapping is already unmapped when it's GEM is evicted. If it's not 
evicted I proceed as usual.

Anyway, drm_gpuva_insert() and drm_gpuva_remove() (was 
drm_gpuva_destroy() before) do *not* implicitly add the gpuva to the 
GEM's gpuva list anymore. Instead there is only drm_gpuva_link() and 
drm_gpuva_unlink(), not doing any lockdep checks, but clearly 
documenting that the caller is resposible to take care of mutual exclusion.

> 
> WRT to Christian thoughts on a common uAPI rules for VM binds, I kinda
> like that idea but I don't think that is necessary. All of pur uAPI
> should be close but also the GPUVA implementation should be flexible
> enough to fit all of our needs and I think for the most part it is.
> 
> Let me know what everything thinks about this. It would be great if when
> I'm back on 2/14 I can rebase the Xe port to GPUVA on another version of
> the GPUVA code and get sparse binding support implementation. Also I'd
> like to get GPUVA merged in the Xe repo ASAP as our VM bind code badly
> needed to be cleaned and this was the push we needed to make this
> happen.

All those are great improvements and some of I will pick up for Nouveau 
as well.

Except for switching from drm_mm to maple_tree I implemented all other 
suggestions. You can find them here: 
https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-fixes

I'm just about to start moving the GPUVA Manager to use maple_tree 
instead and plan to send a V2 by end of this week.

- Danilo

> 
> Matt
> 
> [1] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314
> [2] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=2ae21d7a3f52e5eb2c105ed8ae231471274bdc36
> [3] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=49fca9f5d96201f5cbd1b19c7ff17eedfac65cdc
> [4] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=61fa6b1e1f10e791ae82358fa971b04421d53024
> [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=87fc08dcf0840e794b38269fe4c6a95d088d79ec
> [6] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=a4826c22f6788bc29906ffa263c1cd3c4661fa77
> [7] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=f008bbb55b213868e52c7b9cda4c1bfb95af6aee
> [8] https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/314/diffs?commit_id=41f4f71c05d04d2b17d988dd95369b5df2d7f681
> 
>> Idea-suggested-by: Dave Airlie <airlied@redhat.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>   Documentation/gpu/drm-mm.rst    |   31 +
>>   drivers/gpu/drm/Makefile        |    1 +
>>   drivers/gpu/drm/drm_gem.c       |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c | 1323 +++++++++++++++++++++++++++++++
>>   include/drm/drm_drv.h           |    6 +
>>   include/drm/drm_gem.h           |   75 ++
>>   include/drm/drm_gpuva_mgr.h     |  527 ++++++++++++
>>   7 files changed, 1966 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index a52e6f4117d6..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -466,6 +466,37 @@ DRM MM Range Allocator Function References
>>   .. kernel-doc:: drivers/gpu/drm/drm_mm.c
>>      :export:
>>   
>> +DRM GPU VA Manager
>> +==================
>> +
>> +Overview
>> +--------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Overview
>> +
>> +Split and Merge
>> +---------------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Split and Merge
>> +
>> +Locking
>> +-------
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :doc: Locking
>> +
>> +
>> +DRM GPU VA Manager Function References
>> +--------------------------------------
>> +
>> +.. kernel-doc:: include/drm/drm_gpuva_mgr.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
>> +   :export:
>> +
>>   DRM Buddy Allocator
>>   ===================
>>   
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 4fe190aee584..de2ffca3b6e4 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -45,6 +45,7 @@ drm-y := \
>>   	drm_vblank.o \
>>   	drm_vblank_work.o \
>>   	drm_vma_manager.o \
>> +	drm_gpuva_mgr.o \
>>   	drm_writeback.o
>>   drm-$(CONFIG_DRM_LEGACY) += \
>>   	drm_agpsupport.o \
>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>> index 59a0bb5ebd85..65115fe88627 100644
>> --- a/drivers/gpu/drm/drm_gem.c
>> +++ b/drivers/gpu/drm/drm_gem.c
>> @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>>   	if (!obj->resv)
>>   		obj->resv = &obj->_resv;
>>   
>> +	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
>> +		drm_gem_gpuva_init(obj);
>> +
>>   	drm_vma_node_reset(&obj->vma_node);
>>   	INIT_LIST_HEAD(&obj->lru_node);
>>   }
>> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
>> new file mode 100644
>> index 000000000000..e665f642689d
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
>> @@ -0,0 +1,1323 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2022 Red Hat.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *     Danilo Krummrich <dakr@redhat.com>
>> + *
>> + */
>> +
>> +#include <drm/drm_gem.h>
>> +#include <drm/drm_gpuva_mgr.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * The DRM GPU VA Manager, represented by struct drm_gpuva_manager keeps track
>> + * of a GPU's virtual address (VA) space and manages the corresponding virtual
>> + * mappings represented by &drm_gpuva objects. It also keeps track of the
>> + * mapping's backing &drm_gem_object buffers.
>> + *
>> + * &drm_gem_object buffers maintain a list (and a corresponding list lock) of
>> + * &drm_gpuva objects representing all existent GPU VA mappings using this
>> + * &drm_gem_object as backing buffer.
>> + *
>> + * A GPU VA mapping can only be created within a previously allocated
>> + * &drm_gpuva_region, which represents a reserved portion of the GPU VA space.
>> + * GPU VA mappings are not allowed to span over a &drm_gpuva_region's boundary.
>> + *
>> + * GPU VA regions can also be flagged as sparse, which allows drivers to create
>> + * sparse mappings for a whole GPU VA region in order to support Vulkan
>> + * 'Sparse Resources'.
>> + *
>> + * The GPU VA manager internally uses the &drm_mm range allocator to manage the
>> + * &drm_gpuva mappings and the &drm_gpuva_regions within a GPU's virtual address
>> + * space.
>> + *
>> + * Besides the GPU VA space regions (&drm_gpuva_region) allocated by a driver
>> + * the &drm_gpuva_manager contains a special region representing the portion of
>> + * VA space reserved by the kernel. This node is initialized together with the
>> + * GPU VA manager instance and removed when the GPU VA manager is destroyed.
>> + *
>> + * In a typical application drivers would embed struct drm_gpuva_manager,
>> + * struct drm_gpuva_region and struct drm_gpuva within their own driver
>> + * specific structures, there won't be any memory allocations of it's own nor
>> + * memory allocations of &drm_gpuva or &drm_gpuva_region entries.
>> + */
>> +
>> +/**
>> + * DOC: Split and Merge
>> + *
>> + * The DRM GPU VA manager also provides an algorithm implementing splitting and
>> + * merging of existent GPU VA mappings with the ones that are requested to be
>> + * mapped or unmapped. This feature is required by the Vulkan API to implement
>> + * Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this as
>> + * VM BIND.
>> + *
>> + * Drivers can call drm_gpuva_sm_map_ops_create() to obtain a list of map, unmap
>> + * and remap operations for a given newly requested mapping. This list
>> + * represents the set of operations to execute in order to integrate the new
>> + * mapping cleanly into the current state of the GPU VA space.
>> + *
>> + * Depending on how the new GPU VA mapping intersects with the existent mappings
>> + * of the GPU VA space the &drm_gpuva_ops contain an arbitrary amount of unmap
>> + * operations, a maximum of two remap operations and a single map operation.
>> + * The set of operations can also be empty if no operation is required, e.g. if
>> + * the requested mapping already exists in the exact same way.
>> + *
>> + * The single map operation, if existent, represents the original map operation
>> + * requested by the caller. Please note that this operation might be altered
>> + * comparing it with the original map operation, e.g. because it was merged with
>> + * an already  existent mapping. Hence, drivers must execute this map operation
>> + * instead of the original one they passed to drm_gpuva_sm_map_ops_create().
>> + *
>> + * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
>> + * &drm_gpuva to unmap is physically contiguous with the original mapping
>> + * request. Optionally, if 'keep' is set, drivers may keep the actual page table
>> + * entries for this &drm_gpuva, adding the missing page table entries only and
>> + * update the &drm_gpuva_manager's view of things accordingly.
>> + *
>> + * Drivers may do the same optimization, namely delta page table updates, also
>> + * for remap operations. This is possible since &drm_gpuva_op_remap consists of
>> + * one unmap operation and one or two map operations, such that drivers can
>> + * derive the page table update delta accordingly.
>> + *
>> + * Note that there can't be more than two existent mappings to split up, one at
>> + * the beginning and one at the end of the new mapping, hence there is a
>> + * maximum of two remap operations.
>> + *
>> + * Generally, the DRM GPU VA manager never merges mappings across the
>> + * boundaries of &drm_gpuva_regions. This is the case since merging between
>> + * GPU VA regions would result into unmap and map operations to be issued for
>> + * both regions involved although the original mapping request was referred to
>> + * one specific GPU VA region only. Since the other GPU VA region, the one not
>> + * explicitly requested to be altered, might be in use by the GPU, we are not
>> + * allowed to issue any map/unmap operations for this region.
>> + *
>> + * Note that before calling drm_gpuva_sm_map_ops_create() again with another
>> + * mapping request it is necessary to update the &drm_gpuva_manager's view of
>> + * the GPU VA space. The previously obtained operations must be either fully
>> + * processed or completely abandoned.
>> + *
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * Analogue to drm_gpuva_sm_map_ops_create() drm_gpuva_sm_unmap_ops_create()
>> + * provides drivers a the list of operations to be executed in order to unmap
>> + * a range of GPU VA space. The logic behind this functions is way simpler
>> + * though: For all existent mappings enclosed by the given range unmap
>> + * operations are created. For mappings which are only partically located within
>> + * the given range, remap operations are created such that those mappings are
>> + * split up and re-mapped partically.
>> + *
>> + * The following paragraph depicts the basic constellations of existent GPU VA
>> + * mappings, a newly requested mapping and the resulting mappings as implemented
>> + * by drm_gpuva_sm_map_ops_create()  - it doesn't cover arbitrary combinations
>> + * of those constellations.
>> + *
>> + * ::
>> + *
>> + *	1) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	2) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     1
>> + *	req: |-----------| (bo_offset=m)
>> + *
>> + *	     0     a     1
>> + *	new: |-----------| (bo_offset=m)
>> + *
>> + *
>> + *	3) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     1
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	4) Existent mapping is replaced.
>> + *	--------------------------------
>> + *
>> + *	     0  a  1
>> + *	old: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or bo_offset.
>> + *
>> + *
>> + *	5) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  b  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0  b  1  a' 2
>> + *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	6) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	     0  a  1
>> + *	req: |-----|       (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	7) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----| (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2
>> + *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	8) Existent mapping is kept.
>> + *	----------------------------
>> + *
>> + *	      0     a     2
>> + *	old: |-----------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----| (bo_offset=n+1)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *
>> + *	9) Existent mapping is split.
>> + *	-----------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     b     3
>> + *	req:       |-----------| (bo_offset=m)
>> + *
>> + *	     0  a  1     b     3
>> + *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
>> + *
>> + *
>> + *	10) Existent mapping is merged.
>> + *	-------------------------------
>> + *
>> + *	     0     a     2
>> + *	old: |-----------|       (bo_offset=n)
>> + *
>> + *	           1     a     3
>> + *	req:       |-----------| (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	new: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	11) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  b  2
>> + *	req:       |-----|       (bo_offset=m)
>> + *
>> + *	     0  a  1  b  2  a' 3
>> + *	new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
>> + *
>> + *
>> + *	12) Existent mapping is kept.
>> + *	-----------------------------
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *	           1  a  2
>> + *	req:       |-----|       (bo_offset=n+1)
>> + *
>> + *	     0        a        3
>> + *	old: |-----------------| (bo_offset=n)
>> + *
>> + *
>> + *	13) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	req: |-----------| (bo_offset=n)
>> + *
>> + *	     0     a     2
>> + *	new: |-----------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	14) Existent mapping is replaced.
>> + *	---------------------------------
>> + *
>> + *	           1  a  2
>> + *	old:       |-----| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	req: |----------------| (bo_offset=n)
>> + *
>> + *	     0        a       3
>> + *	new: |----------------| (bo_offset=n)
>> + *
>> + *	Note: We expect to see the same result for a request with a different bo
>> + *	      and/or non-contiguous bo_offset.
>> + *
>> + *
>> + *	15) Existent mapping is split.
>> + *	------------------------------
>> + *
>> + *	           1     a     3
>> + *	old:       |-----------| (bo_offset=n)
>> + *
>> + *	     0     b     2
>> + *	req: |-----------|       (bo_offset=m)
>> + *
>> + *	     0     b     2  a' 3
>> + *	new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>> + *
>> + *
>> + *	16) Existent mappings are merged.
>> + *	---------------------------------
>> + *
>> + *	     0     a     1
>> + *	old: |-----------|                        (bo_offset=n)
>> + *
>> + *	                            2     a     3
>> + *	old':                       |-----------| (bo_offset=n+2)
>> + *
>> + *	                1     a     2
>> + *	req:            |-----------|             (bo_offset=n+1)
>> + *
>> + *	                      a
>> + *	new: |----------------------------------| (bo_offset=n)
>> + */
>> +
>> +/**
>> + * DOC: Locking
>> + *
>> + * Generally, the GPU VA manager does not take care of locking itself, it is
>> + * the drivers responsibility to take care about locking. Drivers might want to
>> + * protect the following operations: inserting, destroying and iterating
>> + * &drm_gpuva and &drm_gpuva_region objects as well as generating split and merge
>> + * operations.
>> + *
>> + * The GPU VA manager does take care of the locking of the backing
>> + * &drm_gem_object buffers GPU VA lists though, unless the provided functions
>> + * documentation claims otherwise.
>> + */
>> +
>> +/**
>> + * drm_gpuva_manager_init - initialize a &drm_gpuva_manager
>> + * @mgr: pointer to the &drm_gpuva_manager to initialize
>> + * @name: the name of the GPU VA space
>> + * @start_offset: the start offset of the GPU VA space
>> + * @range: the size of the GPU VA space
>> + * @reserve_offset: the start of the kernel reserved GPU VA area
>> + * @reserve_range: the size of the kernel reserved GPU VA area
>> + *
>> + * The &drm_gpuva_manager must be initialized with this function before use.
>> + *
>> + * Note that @mgr must be cleared to 0 before calling this function. The given
>> + * &name is expected to be managed by the surrounding driver structures.
>> + */
>> +void
>> +drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>> +		       const char *name,
>> +		       u64 start_offset, u64 range,
>> +		       u64 reserve_offset, u64 reserve_range)
>> +{
>> +	drm_mm_init(&mgr->va_mm, start_offset, range);
>> +	drm_mm_init(&mgr->region_mm, start_offset, range);
>> +
>> +	mgr->mm_start = start_offset;
>> +	mgr->mm_range = range;
>> +
>> +	mgr->name = name ? name : "unknown";
>> +
>> +	memset(&mgr->kernel_alloc_node, 0, sizeof(struct drm_mm_node));
>> +	mgr->kernel_alloc_node.start = reserve_offset;
>> +	mgr->kernel_alloc_node.size = reserve_range;
>> +	drm_mm_reserve_node(&mgr->region_mm, &mgr->kernel_alloc_node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_manager_init);
>> +
>> +/**
>> + * drm_gpuva_manager_destroy - cleanup a &drm_gpuva_manager
>> + * @mgr: pointer to the &drm_gpuva_manager to clean up
>> + *
>> + * Note that it is a bug to call this function on a manager that still
>> + * holds GPU VA mappings.
>> + */
>> +void
>> +drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
>> +{
>> +	mgr->name = NULL;
>> +	drm_mm_remove_node(&mgr->kernel_alloc_node);
>> +	drm_mm_takedown(&mgr->va_mm);
>> +	drm_mm_takedown(&mgr->region_mm);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_manager_destroy);
>> +
>> +static struct drm_gpuva_region *
>> +drm_gpuva_in_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +
>> +	/* Find the VA region the requested range is strictly enclosed by. */
>> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range) {
>> +		if (reg->node.start <= addr &&
>> +		    reg->node.start + reg->node.size >= addr + range &&
>> +		    &reg->node != &mgr->kernel_alloc_node)
>> +			return reg;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static bool
>> +drm_gpuva_in_any_region(struct drm_gpuva_manager *mgr, u64 addr, u64 range)
>> +{
>> +	return !!drm_gpuva_in_region(mgr, addr, range);
>> +}
>> +
>> +/**
>> + * drm_gpuva_insert - insert a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @va: the &drm_gpuva to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>> +		 struct drm_gpuva *va,
>> +		 u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +	int ret;
>> +
>> +	if (!va->gem.obj)
>> +		return -EINVAL;
>> +
>> +	reg = drm_gpuva_in_region(mgr, addr, range);
>> +	if (!reg)
>> +		return -EINVAL;
>> +
>> +	ret = drm_mm_insert_node_in_range(&mgr->va_mm, &va->node,
>> +					  range, 0,
>> +					  0, addr,
>> +					  addr + range,
>> +					  DRM_MM_INSERT_LOW|DRM_MM_INSERT_ONCE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	va->mgr = mgr;
>> +	va->region = reg;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_insert);
>> +
>> +/**
>> + * drm_gpuva_link_locked - link a &drm_gpuva
>> + * @va: the &drm_gpuva to link
>> + *
>> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_link_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +	list_add_tail(&va->head, &va->gem.obj->gpuva.list);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_link_locked);
>> +
>> +/**
>> + * drm_gpuva_link_unlocked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This adds the given &va to the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_link_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	drm_gpuva_link_locked(va);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_link_unlocked);
>> +
>> +/**
>> + * drm_gpuva_unlink_locked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_unlink_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +	list_del_init(&va->head);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_unlink_locked);
>> +
>> +/**
>> + * drm_gpuva_unlink_unlocked - unlink a &drm_gpuva
>> + * @va: the &drm_gpuva to unlink
>> + *
>> + * This removes the given &va from the GPU VA list of the &drm_gem_object it is
>> + * associated with.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_unlink_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	drm_gpuva_unlink_locked(va);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_unlink_unlocked);
>> +
>> +/**
>> + * drm_gpuva_destroy_locked - destroy a &drm_gpuva
>> + * @va: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
>> + * associated with and removes it from the underlaying range allocator.
>> + *
>> + * The function assumes the caller already holds the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_destroy_locked(struct drm_gpuva *va)
>> +{
>> +	lockdep_assert_held(&va->gem.obj->gpuva.mutex);
>> +
>> +	list_del(&va->head);
>> +	drm_mm_remove_node(&va->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_destroy_locked);
>> +
>> +/**
>> + * drm_gpuva_destroy_unlocked - destroy a &drm_gpuva
>> + * @va: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &va from GPU VA list of the &drm_gem_object it is
>> + * associated with and removes it from the underlaying range allocator.
>> + *
>> + * The function assumes the caller does not hold the &drm_gem_object's
>> + * GPU VA list mutex.
>> + */
>> +void
>> +drm_gpuva_destroy_unlocked(struct drm_gpuva *va)
>> +{
>> +	drm_gem_gpuva_lock(va->gem.obj);
>> +	list_del(&va->head);
>> +	drm_gem_gpuva_unlock(va->gem.obj);
>> +
>> +	drm_mm_remove_node(&va->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_destroy_unlocked);
>> +
>> +/**
>> + * drm_gpuva_find - find a &drm_gpuva
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuvas address
>> + * @range: the &drm_gpuvas range
>> + *
>> + * Returns: the &drm_gpuva at a given &addr and with a given &range
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find(struct drm_gpuva_manager *mgr,
>> +	       u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva *va;
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, addr, range) {
>> +		if (va->node.start == addr &&
>> +		    va->node.size == range)
>> +			return va;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find);
>> +
>> +/**
>> + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @start: the given GPU VA's start address
>> + *
>> + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
>> +{
>> +	struct drm_mm_node *node;
>> +
>> +	if (start <= mgr->mm_start ||
>> +	    start > (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	node = __drm_mm_interval_first(&mgr->va_mm, start - 1, start);
>> +	if (node == &mgr->va_mm.head_node)
>> +		return NULL;
>> +
>> +	return (struct drm_gpuva *)node;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_prev);
>> +
>> +/**
>> + * drm_gpuva_find_next - find the &drm_gpuva after the given address
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @end: the given GPU VA's end address
>> + *
>> + * Find the adjacent &drm_gpuva after the GPU VA with given &end address.
>> + *
>> + * Note that if there is any free space between the GPU VA mappings no mapping
>> + * is returned.
>> + *
>> + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
>> + */
>> +struct drm_gpuva *
>> +drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end)
>> +{
>> +	struct drm_mm_node *node;
>> +
>> +	if (end < mgr->mm_start ||
>> +	    end >= (mgr->mm_start + mgr->mm_range))
>> +		return NULL;
>> +
>> +	node = __drm_mm_interval_first(&mgr->va_mm, end, end + 1);
>> +	if (node == &mgr->va_mm.head_node)
>> +		return NULL;
>> +
>> +	return (struct drm_gpuva *)node;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_find_next);
>> +
>> +/**
>> + * drm_gpuva_region_insert - insert a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to insert the &drm_gpuva in
>> + * @reg: the &drm_gpuva_region to insert
>> + * @addr: the start address of the GPU VA
>> + * @range: the range of the GPU VA
>> + *
>> + * Insert a &drm_gpuva_region with a given address and range into a
>> + * &drm_gpuva_manager.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +int
>> +drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			struct drm_gpuva_region *reg,
>> +			u64 addr, u64 range)
>> +{
>> +	int ret;
>> +
>> +	ret = drm_mm_insert_node_in_range(&mgr->region_mm, &reg->node,
>> +					  range, 0,
>> +					  0, addr,
>> +					  addr + range,
>> +					  DRM_MM_INSERT_LOW|
>> +					  DRM_MM_INSERT_ONCE);
>> +	if (ret)
>> +		return ret;
>> +
>> +	reg->mgr = mgr;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_insert);
>> +
>> +/**
>> + * drm_gpuva_region_destroy - destroy a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager holding the region
>> + * @reg: the &drm_gpuva to destroy
>> + *
>> + * This removes the given &reg from the underlaying range allocator.
>> + */
>> +void
>> +drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>> +			 struct drm_gpuva_region *reg)
>> +{
>> +	struct drm_gpuva *va;
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr,
>> +				       reg->node.start,
>> +				       reg->node.size) {
>> +		WARN(1, "GPU VA region must be empty on destroy.\n");
>> +		return;
>> +	}
>> +
>> +	if (&reg->node == &mgr->kernel_alloc_node) {
>> +		WARN(1, "Can't destroy kernel reserved region.\n");
>> +		return;
>> +	}
>> +
>> +	drm_mm_remove_node(&reg->node);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_destroy);
>> +
>> +/**
>> + * drm_gpuva_region_find - find a &drm_gpuva_region
>> + * @mgr: the &drm_gpuva_manager to search in
>> + * @addr: the &drm_gpuva_regions address
>> + * @range: the &drm_gpuva_regions range
>> + *
>> + * Returns: the &drm_gpuva_region at a given &addr and with a given &range
>> + */
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>> +		      u64 addr, u64 range)
>> +{
>> +	struct drm_gpuva_region *reg;
>> +
>> +	drm_gpuva_for_each_region_in_range(reg, mgr, addr, addr + range)
>> +		if (reg->node.start == addr &&
>> +		    reg->node.size == range)
>> +			return reg;
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_region_find);
>> +
>> +static int
>> +gpuva_op_map_new(struct drm_gpuva_op **pop,
>> +		 u64 addr, u64 range,
>> +		 struct drm_gem_object *obj, u64 offset)
>> +{
>> +	struct drm_gpuva_op *op;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_MAP;
>> +	op->map.va.addr = addr;
>> +	op->map.va.range = range;
>> +	op->map.gem.obj = obj;
>> +	op->map.gem.offset = offset;
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +gpuva_op_remap_new(struct drm_gpuva_op **pop,
>> +		   struct drm_gpuva_op_map *prev,
>> +		   struct drm_gpuva_op_map *next,
>> +		   struct drm_gpuva_op_unmap *unmap)
>> +{
>> +	struct drm_gpuva_op *op;
>> +	struct drm_gpuva_op_remap *r;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_REMAP;
>> +	r = &op->remap;
>> +
>> +	if (prev) {
>> +		r->prev = kmemdup(prev, sizeof(*prev), GFP_KERNEL);
>> +		if (!r->prev)
>> +			goto err_free_op;
>> +	}
>> +
>> +	if (next) {
>> +		r->next = kmemdup(next, sizeof(*next), GFP_KERNEL);
>> +		if (!r->next)
>> +			goto err_free_prev;
>> +	}
>> +
>> +	r->unmap = kmemdup(unmap, sizeof(*unmap), GFP_KERNEL);
>> +	if (!r->unmap)
>> +		goto err_free_next;
>> +
>> +	return 0;
>> +
>> +err_free_next:
>> +	if (next)
>> +		kfree(r->next);
>> +err_free_prev:
>> +	if (prev)
>> +		kfree(r->prev);
>> +err_free_op:
>> +	kfree(op);
>> +	*pop = NULL;
>> +
>> +	return -ENOMEM;
>> +}
>> +
>> +static int
>> +gpuva_op_unmap_new(struct drm_gpuva_op **pop,
>> +		   struct drm_gpuva *va, bool merge)
>> +{
>> +	struct drm_gpuva_op *op;
>> +
>> +	op = *pop = kzalloc(sizeof(*op), GFP_KERNEL);
>> +	if (!op)
>> +		return -ENOMEM;
>> +
>> +	op->op = DRM_GPUVA_OP_UNMAP;
>> +	op->unmap.va = va;
>> +	op->unmap.keep = merge;
>> +
>> +	return 0;
>> +}
>> +
>> +#define op_map_new_to_list(_ops, _addr, _range,		\
>> +			   _obj, _offset)		\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_map_new(&op, _addr, _range,	\
>> +			       _obj, _offset);		\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +#define op_remap_new_to_list(_ops, _prev, _next,	\
>> +			     _unmap)			\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_remap_new(&op, _prev, _next,	\
>> +				 _unmap);		\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +#define op_unmap_new_to_list(_ops, _gpuva, _merge)	\
>> +do {							\
>> +	struct drm_gpuva_op *op;			\
>> +							\
>> +	ret = gpuva_op_unmap_new(&op, _gpuva, _merge);	\
>> +	if (ret)					\
>> +		goto err_free_ops;			\
>> +							\
>> +	list_add_tail(&op->entry, _ops);		\
>> +} while (0)
>> +
>> +/**
>> + * drm_gpuva_sm_map_ops_create - creates the &drm_gpuva_ops to split and merge
>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>> + * @req_addr: the start address of the new mapping
>> + * @req_range: the range of the new mapping
>> + * @req_obj: the &drm_gem_object to map
>> + * @req_offset: the offset within the &drm_gem_object
>> + *
>> + * This function creates a list of operations to perform splitting and merging
>> + * of existent mapping(s) with the newly requested one.
>> + *
>> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>> + * in the given order. It can contain map, unmap and remap operations, but it
>> + * also can be empty if no operation is required, e.g. if the requested mapping
>> + * already exists is the exact same way.
>> + *
>> + * There can be an arbitrary amount of unmap operations, a maximum of two remap
>> + * operations and a single map operation. The latter one, if existent,
>> + * represents the original map operation requested by the caller. Please note
>> + * that the map operation might has been modified, e.g. if it was
>> + * merged with an existent mapping.
>> + *
>> + * Note that before calling this function again with another mapping request it
>> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>> + * The previously obtained operations must be either processed or abandoned.
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * After the caller finished processing the returned &drm_gpuva_ops, they must
>> + * be freed with &drm_gpuva_ops_free.
>> + *
>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>> + */
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>> +			    u64 req_addr, u64 req_range,
>> +			    struct drm_gem_object *req_obj, u64 req_offset)
>> +{
>> +	struct drm_gpuva_ops *ops;
>> +	struct drm_gpuva *va, *prev = NULL;
>> +	u64 req_end = req_addr + req_range;
>> +	bool skip_pmerge = false, skip_nmerge = false;
>> +	int ret;
>> +
>> +	if (!drm_gpuva_in_any_region(mgr, req_addr, req_range))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>> +	if (!ops)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	INIT_LIST_HEAD(&ops->list);
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +
>> +		/* Generally, we want to skip merging with potential mappings
>> +		 * left and right of the requested one when we found a
>> +		 * collision, since merging happens in this loop already.
>> +		 *
>> +		 * However, there is one exception when the requested mapping
>> +		 * spans into a free VM area. If this is the case we might
>> +		 * still hit the boundary of another mapping before and/or
>> +		 * after the free VM area.
>> +		 */
>> +		skip_pmerge = true;
>> +		skip_nmerge = true;
>> +
>> +		if (addr == req_addr) {
>> +			bool merge = obj == req_obj &&
>> +				     offset == req_offset;
>> +			if (end == req_end) {
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_unmap_new_to_list(&ops->list, va, false);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				skip_nmerge = false;
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = range - req_range,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + req_range,
>> +				};
>> +				struct drm_gpuva_op_unmap u = { .va = va };
>> +
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>> +				break;
>> +			}
>> +		} else if (addr < req_addr) {
>> +			u64 ls_range = req_addr - addr;
>> +			struct drm_gpuva_op_map p = {
>> +				.va.addr = addr,
>> +				.va.range = ls_range,
>> +				.gem.obj = obj,
>> +				.gem.offset = offset,
>> +			};
>> +			struct drm_gpuva_op_unmap u = { .va = va };
>> +			bool merge = obj == req_obj &&
>> +				     offset + ls_range == req_offset;
>> +
>> +			if (end == req_end) {
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				u64 new_addr = addr;
>> +				u64 new_range = req_range + ls_range;
>> +				u64 new_offset = offset;
>> +
>> +				/* We validated that the requested mapping is
>> +				 * within a single VA region already.
>> +				 * Since it overlaps the current mapping (which
>> +				 * can't cross a VA region boundary) we can be
>> +				 * sure that we're still within the boundaries
>> +				 * of the same VA region after merging.
>> +				 */
>> +				if (merge) {
>> +					req_offset = new_offset;
>> +					req_addr = new_addr;
>> +					req_range = new_range;
>> +					op_unmap_new_to_list(&ops->list, va, true);
>> +					goto next;
>> +				}
>> +
>> +				op_remap_new_to_list(&ops->list, &p, NULL, &u);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = end - req_end,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + ls_range +
>> +						      req_range,
>> +				};
>> +
>> +				if (merge)
>> +					goto done;
>> +
>> +				op_remap_new_to_list(&ops->list, &p, &n, &u);
>> +				break;
>> +			}
>> +		} else if (addr > req_addr) {
>> +			bool merge = obj == req_obj &&
>> +				     offset == req_offset +
>> +					       (addr - req_addr);
>> +			if (!prev)
>> +				skip_pmerge = false;
>> +
>> +			if (end == req_end) {
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				break;
>> +			}
>> +
>> +			if (end < req_end) {
>> +				skip_nmerge = false;
>> +				op_unmap_new_to_list(&ops->list, va, merge);
>> +				goto next;
>> +			}
>> +
>> +			if (end > req_end) {
>> +				struct drm_gpuva_op_map n = {
>> +					.va.addr = req_end,
>> +					.va.range = end - req_end,
>> +					.gem.obj = obj,
>> +					.gem.offset = offset + req_end - addr,
>> +				};
>> +				struct drm_gpuva_op_unmap u = { .va = va };
>> +				u64 new_end = end;
>> +				u64 new_range = new_end - req_addr;
>> +
>> +				/* We validated that the requested mapping is
>> +				 * within a single VA region already.
>> +				 * Since it overlaps the current mapping (which
>> +				 * can't cross a VA region boundary) we can be
>> +				 * sure that we're still within the boundaries
>> +				 * of the same VA region after merging.
>> +				 */
>> +				if (merge) {
>> +					req_end = new_end;
>> +					req_range = new_range;
>> +					op_unmap_new_to_list(&ops->list, va, true);
>> +					break;
>> +				}
>> +
>> +				op_remap_new_to_list(&ops->list, NULL, &n, &u);
>> +				break;
>> +			}
>> +		}
>> +next:
>> +		prev = va;
>> +	}
>> +
>> +	va = skip_pmerge ? NULL : drm_gpuva_find_prev(mgr, req_addr);
>> +	if (va) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 new_offset = offset;
>> +		u64 new_addr = addr;
>> +		u64 new_range = req_range + range;
>> +		bool merge = obj == req_obj &&
>> +			     offset + range == req_offset;
>> +
>> +		/* Don't merge over VA region boundaries. */
>> +		merge &= drm_gpuva_in_any_region(mgr, new_addr, new_range);
>> +		if (merge) {
>> +			op_unmap_new_to_list(&ops->list, va, true);
>> +
>> +			req_offset = new_offset;
>> +			req_addr = new_addr;
>> +			req_range = new_range;
>> +		}
>> +	}
>> +
>> +	va = skip_nmerge ? NULL : drm_gpuva_find_next(mgr, req_end);
>> +	if (va) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +		u64 new_range = req_range + range;
>> +		u64 new_end = end;
>> +		bool merge = obj == req_obj &&
>> +			     offset == req_offset + req_range;
>> +
>> +		/* Don't merge over VA region boundaries. */
>> +		merge &= drm_gpuva_in_any_region(mgr, req_addr, new_range);
>> +		if (merge) {
>> +			op_unmap_new_to_list(&ops->list, va, true);
>> +
>> +			req_range = new_range;
>> +			req_end = new_end;
>> +		}
>> +	}
>> +
>> +	op_map_new_to_list(&ops->list,
>> +			   req_addr, req_range,
>> +			   req_obj, req_offset);
>> +
>> +done:
>> +	return ops;
>> +
>> +err_free_ops:
>> +	drm_gpuva_ops_free(ops);
>> +	return ERR_PTR(ret);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_sm_map_ops_create);
>> +
>> +#undef op_map_new_to_list
>> +#undef op_remap_new_to_list
>> +#undef op_unmap_new_to_list
>> +
>> +/**
>> + * drm_gpuva_sm_unmap_ops_create - creates the &drm_gpuva_ops to split on unmap
>> + * @mgr: the &drm_gpuva_manager representing the GPU VA space
>> + * @req_addr: the start address of the range to unmap
>> + * @req_range: the range of the mappings to unmap
>> + *
>> + * This function creates a list of operations to perform unmapping and, if
>> + * required, splitting of the mappings overlapping the unmap range.
>> + *
>> + * The list can be iterated with &drm_gpuva_for_each_op and must be processed
>> + * in the given order. It can contain unmap and remap operations, depending on
>> + * whether there are actual overlapping mappings to split.
>> + *
>> + * There can be an arbitrary amount of unmap operations and a maximum of two
>> + * remap operations.
>> + *
>> + * Note that before calling this function again with another range to unmap it
>> + * is necessary to update the &drm_gpuva_manager's view of the GPU VA space.
>> + * The previously obtained operations must be processed or abandoned.
>> + * To update the &drm_gpuva_manager's view of the GPU VA space
>> + * drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
>> + * drm_gpuva_destroy_unlocked() should be used.
>> + *
>> + * After the caller finished processing the returned &drm_gpuva_ops, they must
>> + * be freed with &drm_gpuva_ops_free.
>> + *
>> + * Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
>> + */
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>> +			      u64 req_addr, u64 req_range)
>> +{
>> +	struct drm_gpuva_ops *ops;
>> +	struct drm_gpuva_op *op;
>> +	struct drm_gpuva_op_remap *r;
>> +	struct drm_gpuva *va;
>> +	u64 req_end = req_addr + req_range;
>> +	int ret;
>> +
>> +	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
>> +	if (!ops)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	INIT_LIST_HEAD(&ops->list);
>> +
>> +	drm_gpuva_for_each_va_in_range(va, mgr, req_addr, req_end) {
>> +		struct drm_gem_object *obj = va->gem.obj;
>> +		u64 offset = va->gem.offset;
>> +		u64 addr = va->node.start;
>> +		u64 range = va->node.size;
>> +		u64 end = addr + range;
>> +
>> +		op = kzalloc(sizeof(*op), GFP_KERNEL);
>> +		if (!op) {
>> +			ret = -ENOMEM;
>> +			goto err_free_ops;
>> +		}
>> +
>> +		r = &op->remap;
>> +
>> +		if (addr < req_addr) {
>> +			r->prev = kzalloc(sizeof(*r->prev), GFP_KERNEL);
>> +			if (!r->prev) {
>> +				ret = -ENOMEM;
>> +				goto err_free_op;
>> +			}
>> +
>> +			r->prev->va.addr = addr;
>> +			r->prev->va.range = req_addr - addr;
>> +			r->prev->gem.obj = obj;
>> +			r->prev->gem.offset = offset;
>> +		}
>> +
>> +		if (end > req_end) {
>> +			r->next = kzalloc(sizeof(*r->next), GFP_KERNEL);
>> +			if (!r->next) {
>> +				ret = -ENOMEM;
>> +				goto err_free_prev;
>> +			}
>> +
>> +			r->next->va.addr = req_end;
>> +			r->next->va.range = end - req_end;
>> +			r->next->gem.obj = obj;
>> +			r->next->gem.offset = offset + (req_end - addr);
>> +		}
>> +
>> +		if (op->remap.prev || op->remap.next) {
>> +			op->op = DRM_GPUVA_OP_REMAP;
>> +			r->unmap = kzalloc(sizeof(*r->unmap), GFP_KERNEL);
>> +			if (!r->unmap) {
>> +				ret = -ENOMEM;
>> +				goto err_free_next;
>> +			}
>> +
>> +			r->unmap->va = va;
>> +		} else {
>> +			op->op = DRM_GPUVA_OP_UNMAP;
>> +			op->unmap.va = va;
>> +		}
>> +
>> +		list_add_tail(&op->entry, &ops->list);
>> +	}
>> +
>> +	return ops;
>> +
>> +err_free_next:
>> +	if (r->next)
>> +		kfree(r->next);
>> +err_free_prev:
>> +	if (r->prev)
>> +		kfree(r->prev);
>> +err_free_op:
>> +	kfree(op);
>> +err_free_ops:
>> +	drm_gpuva_ops_free(ops);
>> +	return ERR_PTR(ret);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_sm_unmap_ops_create);
>> +
>> +/**
>> + * drm_gpuva_ops_free - free the given &drm_gpuva_ops
>> + * @ops: the &drm_gpuva_ops to free
>> + *
>> + * Frees the given &drm_gpuva_ops structure including all the ops associated
>> + * with it.
>> + */
>> +void
>> +drm_gpuva_ops_free(struct drm_gpuva_ops *ops)
>> +{
>> +	struct drm_gpuva_op *op, *next;
>> +
>> +	drm_gpuva_for_each_op_safe(op, next, ops) {
>> +		list_del(&op->entry);
>> +		if (op->op == DRM_GPUVA_OP_REMAP) {
>> +			if (op->remap.prev)
>> +				kfree(op->remap.prev);
>> +
>> +			if (op->remap.next)
>> +				kfree(op->remap.next);
>> +
>> +			kfree(op->remap.unmap);
>> +		}
>> +		kfree(op);
>> +	}
>> +
>> +	kfree(ops);
>> +}
>> +EXPORT_SYMBOL(drm_gpuva_ops_free);
>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>> index d7c521e8860f..6feacd93aca6 100644
>> --- a/include/drm/drm_drv.h
>> +++ b/include/drm/drm_drv.h
>> @@ -104,6 +104,12 @@ enum drm_driver_feature {
>>   	 * acceleration should be handled by two drivers that are connected using auxiliary bus.
>>   	 */
>>   	DRIVER_COMPUTE_ACCEL            = BIT(7),
>> +	/**
>> +	 * @DRIVER_GEM_GPUVA:
>> +	 *
>> +	 * Driver supports user defined GPU VA bindings for GEM objects.
>> +	 */
>> +	DRIVER_GEM_GPUVA		= BIT(8),
>>   
>>   	/* IMPORTANT: Below are all the legacy flags, add new ones above. */
>>   
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 772a4adf5287..4a3679034966 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -36,6 +36,8 @@
>>   
>>   #include <linux/kref.h>
>>   #include <linux/dma-resv.h>
>> +#include <linux/list.h>
>> +#include <linux/mutex.h>
>>   
>>   #include <drm/drm_vma_manager.h>
>>   
>> @@ -337,6 +339,17 @@ struct drm_gem_object {
>>   	 */
>>   	struct dma_resv _resv;
>>   
>> +	/**
>> +	 * @gpuva:
>> +	 *
>> +	 * Provides the list and list mutex of GPU VAs attached to this
>> +	 * GEM object.
>> +	 */
>> +	struct {
>> +		struct list_head list;
>> +		struct mutex mutex;
>> +	} gpuva;
>> +
>>   	/**
>>   	 * @funcs:
>>   	 *
>> @@ -479,4 +492,66 @@ void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj);
>>   unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
>>   			       bool (*shrink)(struct drm_gem_object *obj));
>>   
>> +/**
>> + * drm_gem_gpuva_init - initialize the gpuva list of a GEM object
>> + * @obj: the &drm_gem_object
>> + *
>> + * This initializes the &drm_gem_object's &drm_gpuva list and the mutex
>> + * protecting it.
>> + *
>> + * Calling this function is only necessary for drivers intending to support the
>> + * &drm_driver_feature DRIVER_GEM_GPUVA.
>> + */
>> +static inline void drm_gem_gpuva_init(struct drm_gem_object *obj)
>> +{
>> +	INIT_LIST_HEAD(&obj->gpuva.list);
>> +	mutex_init(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_gpuva_lock - lock the GEM's gpuva list mutex
>> + * @obj: the &drm_gem_object
>> + *
>> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>> + */
>> +static inline void drm_gem_gpuva_lock(struct drm_gem_object *obj)
>> +{
>> +	mutex_lock(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_gpuva_unlock - unlock the GEM's gpuva list mutex
>> + * @obj: the &drm_gem_object
>> + *
>> + * This unlocks the mutex protecting the &drm_gem_object's &drm_gpuva list.
>> + */
>> +static inline void drm_gem_gpuva_unlock(struct drm_gem_object *obj)
>> +{
>> +	mutex_unlock(&obj->gpuva.mutex);
>> +}
>> +
>> +/**
>> + * drm_gem_for_each_gpuva - iternator to walk over a list of gpuvas
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gem_for_each_gpuva(entry, obj) \
>> +	list_for_each_entry(entry, &obj->gpuva.list, head)
>> +
>> +/**
>> + * drm_gem_for_each_gpuva_safe - iternator to safely walk over a list of gpuvas
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva to store the next step
>> + * @obj: the &drm_gem_object the &drm_gpuvas to walk are associated with
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gem_object. It is implemented with list_for_each_entry_safe(), hence
>> + * it is save against removal of elements.
>> + */
>> +#define drm_gem_for_each_gpuva_safe(entry, next, obj) \
>> +	list_for_each_entry_safe(entry, next, &obj->gpuva.list, head)
>> +
>>   #endif /* __DRM_GEM_H__ */
>> diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
>> new file mode 100644
>> index 000000000000..adeb0c916e91
>> --- /dev/null
>> +++ b/include/drm/drm_gpuva_mgr.h
>> @@ -0,0 +1,527 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#ifndef __DRM_GPUVA_MGR_H__
>> +#define __DRM_GPUVA_MGR_H__
>> +
>> +/*
>> + * Copyright (c) 2022 Red Hat.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <drm/drm_mm.h>
>> +#include <linux/mm.h>
>> +#include <linux/rbtree.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/types.h>
>> +
>> +struct drm_gpuva_region;
>> +struct drm_gpuva;
>> +struct drm_gpuva_ops;
>> +
>> +/**
>> + * struct drm_gpuva_manager - DRM GPU VA Manager
>> + *
>> + * The DRM GPU VA Manager keeps track of a GPU's virtual address space by using
>> + * the &drm_mm range allocator. Typically, this structure is embedded in bigger
>> + * driver structures.
>> + *
>> + * Drivers can pass addresses and ranges in an arbitrary unit, e.g. bytes or
>> + * pages.
>> + *
>> + * There should be one manager instance per GPU virtual address space.
>> + */
>> +struct drm_gpuva_manager {
>> +	/**
>> +	 * @name: the name of the DRM GPU VA space
>> +	 */
>> +	const char *name;
>> +
>> +	/**
>> +	 * @mm_start: start of the VA space
>> +	 */
>> +	u64 mm_start;
>> +
>> +	/**
>> +	 * @mm_range: length of the VA space
>> +	 */
>> +	u64 mm_range;
>> +
>> +	/**
>> +	 * @region_mm: the &drm_mm range allocator to track GPU VA regions
>> +	 */
>> +	struct drm_mm region_mm;
>> +
>> +	/**
>> +	 * @va_mm: the &drm_mm range allocator to track GPU VA mappings
>> +	 */
>> +	struct drm_mm va_mm;
>> +
>> +	/**
>> +	 * @kernel_alloc_node:
>> +	 *
>> +	 * &drm_mm_node representing the address space cutout reserved for
>> +	 * the kernel
>> +	 */
>> +	struct drm_mm_node kernel_alloc_node;
>> +};
>> +
>> +void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
>> +			    const char *name,
>> +			    u64 start_offset, u64 range,
>> +			    u64 reserve_offset, u64 reserve_range);
>> +void drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr);
>> +
>> +/**
>> + * struct drm_gpuva_region - structure to track a portion of GPU VA space
>> + *
>> + * This structure represents a portion of a GPUs VA space and is associated
>> + * with a &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>> + *
>> + * GPU VA mappings, represented by &drm_gpuva objects, are restricted to be
>> + * placed within a &drm_gpuva_region.
>> + */
>> +struct drm_gpuva_region {
>> +	/**
>> +	 * @node: the &drm_mm_node to track the GPU VA region
>> +	 */
>> +	struct drm_mm_node node;
>> +
>> +	/**
>> +	 * @mgr: the &drm_gpuva_manager this object is associated with
>> +	 */
>> +	struct drm_gpuva_manager *mgr;
>> +
>> +	/**
>> +	 * @sparse: indicates whether this region is sparse
>> +	 */
>> +	bool sparse;
>> +};
>> +
>> +struct drm_gpuva_region *
>> +drm_gpuva_region_find(struct drm_gpuva_manager *mgr,
>> +		      u64 addr, u64 range);
>> +int drm_gpuva_region_insert(struct drm_gpuva_manager *mgr,
>> +			    struct drm_gpuva_region *reg,
>> +			    u64 addr, u64 range);
>> +void drm_gpuva_region_destroy(struct drm_gpuva_manager *mgr,
>> +			      struct drm_gpuva_region *reg);
>> +
>> +int drm_gpuva_insert(struct drm_gpuva_manager *mgr,
>> +		     struct drm_gpuva *va,
>> +		     u64 addr, u64 range);
>> +/**
>> + * drm_gpuva_for_each_region_in_range - iternator to walk over a range of nodes
>> + * @node__: &drm_gpuva_region structure to assign to in each iteration step
>> + * @gpuva__: &drm_gpuva_manager structure to walk
>> + * @start__: starting offset, the first node will overlap this
>> + * @end__: ending offset, the last node will start before this (but may overlap)
>> + *
>> + * This iterator walks over all nodes in the range allocator that lie
>> + * between @start and @end. It is implemented similarly to list_for_each(),
>> + * but is using &drm_mm's internal interval tree to accelerate the search for
>> + * the starting node, and hence isn't safe against removal of elements. It
>> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>> + * backing &drm_mm, and may even continue indefinitely.
>> + */
>> +#define drm_gpuva_for_each_region_in_range(node__, gpuva__, start__, end__) \
>> +	for (node__ = (struct drm_gpuva_region *)__drm_mm_interval_first(&(gpuva__)->region_mm, \
>> +									 (start__), (end__)-1); \
>> +	     node__->node.start < (end__); \
>> +	     node__ = (struct drm_gpuva_region *)list_next_entry(&node__->node, node_list))
>> +
>> +/**
>> + * drm_gpuva_for_each_region - iternator to walk over a range of nodes
>> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva_region structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gpuva_for_each_region(entry, gpuva) \
>> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>> +
>> +/**
>> + * drm_gpuva_for_each_region_safe - iternator to safely walk over a range of
>> + * nodes
>> + * @entry: &drm_gpuva_region structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva_region to store the next step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva_region structures associated with the
>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>> + * against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_region_safe(entry, next, gpuva) \
>> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->region_mm), node.node_list)
>> +
>> +
>> +/**
>> + * enum drm_gpuva_flags - flags for struct drm_gpuva
>> + */
>> +enum drm_gpuva_flags {
>> +	/**
>> +	 * @DRM_GPUVA_SWAPPED: flag indicating that the &drm_gpuva is swapped
>> +	 */
>> +	DRM_GPUVA_SWAPPED = (1 << 0),
>> +};
>> +
>> +/**
>> + * struct drm_gpuva - structure to track a GPU VA mapping
>> + *
>> + * This structure represents a GPU VA mapping and is associated with a
>> + * &drm_gpuva_manager. Internally it is based on a &drm_mm_node.
>> + *
>> + * Typically, this structure is embedded in bigger driver structures.
>> + */
>> +struct drm_gpuva {
>> +	/**
>> +	 * @node: the &drm_mm_node to track the GPU VA mapping
>> +	 */
>> +	struct drm_mm_node node;
>> +
>> +	/**
>> +	 * @mgr: the &drm_gpuva_manager this object is associated with
>> +	 */
>> +	struct drm_gpuva_manager *mgr;
>> +
>> +	/**
>> +	 * @region: the &drm_gpuva_region the &drm_gpuva is mapped in
>> +	 */
>> +	struct drm_gpuva_region *region;
>> +
>> +	/**
>> +	 * @head: the &list_head to attach this object to a &drm_gem_object
>> +	 */
>> +	struct list_head head;
>> +
>> +	/**
>> +	 * @flags: the &drm_gpuva_flags for this mapping
>> +	 */
>> +	enum drm_gpuva_flags flags;
>> +
>> +	/**
>> +	 * @gem: structure containing the &drm_gem_object and it's offset
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @offset: the offset within the &drm_gem_object
>> +		 */
>> +		u64 offset;
>> +
>> +		/**
>> +		 * @obj: the mapped &drm_gem_object
>> +		 */
>> +		struct drm_gem_object *obj;
>> +	} gem;
>> +};
>> +
>> +void drm_gpuva_link_locked(struct drm_gpuva *va);
>> +void drm_gpuva_link_unlocked(struct drm_gpuva *va);
>> +void drm_gpuva_unlink_locked(struct drm_gpuva *va);
>> +void drm_gpuva_unlink_unlocked(struct drm_gpuva *va);
>> +
>> +void drm_gpuva_destroy_locked(struct drm_gpuva *va);
>> +void drm_gpuva_destroy_unlocked(struct drm_gpuva *va);
>> +
>> +struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
>> +				 u64 addr, u64 range);
>> +struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start);
>> +struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuva_manager *mgr, u64 end);
>> +
>> +/**
>> + * drm_gpuva_swap - sets whether the backing BO of this &drm_gpuva is swapped
>> + * @va: the &drm_gpuva to set the swap flag of
>> + * @swap: indicates whether the &drm_gpuva is swapped
>> + */
>> +static inline void drm_gpuva_swap(struct drm_gpuva *va, bool swap)
>> +{
>> +	if (swap)
>> +		va->flags |= DRM_GPUVA_SWAPPED;
>> +	else
>> +		va->flags &= ~DRM_GPUVA_SWAPPED;
>> +}
>> +
>> +/**
>> + * drm_gpuva_swapped - indicates whether the backing BO of this &drm_gpuva
>> + * is swapped
>> + * @va: the &drm_gpuva to check
>> + */
>> +static inline bool drm_gpuva_swapped(struct drm_gpuva *va)
>> +{
>> +	return va->flags & DRM_GPUVA_SWAPPED;
>> +}
>> +
>> +/**
>> + * drm_gpuva_for_each_va_in_range - iternator to walk over a range of nodes
>> + * @node__: &drm_gpuva structure to assign to in each iteration step
>> + * @gpuva__: &drm_gpuva_manager structure to walk
>> + * @start__: starting offset, the first node will overlap this
>> + * @end__: ending offset, the last node will start before this (but may overlap)
>> + *
>> + * This iterator walks over all nodes in the range allocator that lie
>> + * between @start and @end. It is implemented similarly to list_for_each(),
>> + * but is using &drm_mm's internal interval tree to accelerate the search for
>> + * the starting node, and hence isn't safe against removal of elements. It
>> + * assumes that @end is within (or is the upper limit of) the &drm_gpuva_manager.
>> + * If [@start, @end] are beyond the range of the &drm_gpuva_manager, the
>> + * iterator may walk over the special _unallocated_ &drm_mm.head_node of the
>> + * backing &drm_mm, and may even continue indefinitely.
>> + */
>> +#define drm_gpuva_for_each_va_in_range(node__, gpuva__, start__, end__) \
>> +	for (node__ = (struct drm_gpuva *)__drm_mm_interval_first(&(gpuva__)->va_mm, \
>> +								  (start__), (end__)-1); \
>> +	     node__->node.start < (end__); \
>> +	     node__ = (struct drm_gpuva *)list_next_entry(&node__->node, node_list))
>> +
>> +/**
>> + * drm_gpuva_for_each_va - iternator to walk over a range of nodes
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager.
>> + */
>> +#define drm_gpuva_for_each_va(entry, gpuva) \
>> +	list_for_each_entry(entry, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>> +
>> +/**
>> + * drm_gpuva_for_each_va_safe - iternator to safely walk over a range of
>> + * nodes
>> + * @entry: &drm_gpuva structure to assign to in each iteration step
>> + * @next: &next &drm_gpuva to store the next step
>> + * @gpuva: &drm_gpuva_manager structure to walk
>> + *
>> + * This iterator walks over all &drm_gpuva structures associated with the
>> + * &drm_gpuva_manager. It is implemented with list_for_each_safe(), so save
>> + * against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_va_safe(entry, next, gpuva) \
>> +	list_for_each_entry_safe(entry, next, drm_mm_nodes(&(gpuva)->va_mm), node.node_list)
>> +
>> +/**
>> + * enum drm_gpuva_op_type - GPU VA operation type
>> + *
>> + * Operations to alter the GPU VA mappings tracked by the &drm_gpuva_manager
>> + * can be map, remap or unmap operations.
>> + */
>> +enum drm_gpuva_op_type {
>> +	/**
>> +	 * @DRM_GPUVA_OP_MAP: the map op type
>> +	 */
>> +	DRM_GPUVA_OP_MAP,
>> +
>> +	/**
>> +	 * @DRM_GPUVA_OP_REMAP: the remap op type
>> +	 */
>> +	DRM_GPUVA_OP_REMAP,
>> +
>> +	/**
>> +	 * @DRM_GPUVA_OP_UNMAP: the unmap op type
>> +	 */
>> +	DRM_GPUVA_OP_UNMAP,
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_map - GPU VA map operation
>> + *
>> + * This structure represents a single map operation generated by the
>> + * DRM GPU VA manager.
>> + */
>> +struct drm_gpuva_op_map {
>> +	/**
>> +	 * @va: structure containing address and range of a map
>> +	 * operation
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @addr: the base address of the new mapping
>> +		 */
>> +		u64 addr;
>> +
>> +		/**
>> +		 * @range: the range of the new mapping
>> +		 */
>> +		u64 range;
>> +	} va;
>> +
>> +	/**
>> +	 * @gem: structure containing the &drm_gem_object and it's offset
>> +	 */
>> +	struct {
>> +		/**
>> +		 * @offset: the offset within the &drm_gem_object
>> +		 */
>> +		u64 offset;
>> +
>> +		/**
>> +		 * @obj: the &drm_gem_object to map
>> +		 */
>> +		struct drm_gem_object *obj;
>> +	} gem;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_unmap - GPU VA unmap operation
>> + *
>> + * This structure represents a single unmap operation generated by the
>> + * DRM GPU VA manager.
>> + */
>> +struct drm_gpuva_op_unmap {
>> +	/**
>> +	 * @va: the &drm_gpuva to unmap
>> +	 */
>> +	struct drm_gpuva *va;
>> +
>> +	/**
>> +	 * @keep:
>> +	 *
>> +	 * Indicates whether this &drm_gpuva is physically contiguous with the
>> +	 * original mapping request.
>> +	 *
>> +	 * Optionally, if &keep is set, drivers may keep the actual page table
>> +	 * mappings for this &drm_gpuva, adding the missing page table entries
>> +	 * only and update the &drm_gpuva_manager accordingly.
>> +	 */
>> +	bool keep;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op_remap - GPU VA remap operation
>> + *
>> + * This represents a single remap operation generated by the DRM GPU VA manager.
>> + *
>> + * A remap operation is generated when an existing GPU VA mmapping is split up
>> + * by inserting a new GPU VA mapping or by partially unmapping existent
>> + * mapping(s), hence it consists of a maximum of two map and one unmap
>> + * operation.
>> + *
>> + * The @unmap operation takes care of removing the original existing mapping.
>> + * @prev is used to remap the preceding part, @next the subsequent part.
>> + *
>> + * If either a new mapping's start address is aligned with the start address
>> + * of the old mapping or the new mapping's end address is aligned with the
>> + * end address of the old mapping, either @prev or @next is NULL.
>> + *
>> + * Note, the reason for a dedicated remap operation, rather than arbitrary
>> + * unmap and map operations, is to give drivers the chance of extracting driver
>> + * specific data for creating the new mappings from the unmap operations's
>> + * &drm_gpuva structure which typically is embedded in larger driver specific
>> + * structures.
>> + */
>> +struct drm_gpuva_op_remap {
>> +	/**
>> +	 * @prev: the preceding part of a split mapping
>> +	 */
>> +	struct drm_gpuva_op_map *prev;
>> +
>> +	/**
>> +	 * @next: the subsequent part of a split mapping
>> +	 */
>> +	struct drm_gpuva_op_map *next;
>> +
>> +	/**
>> +	 * @unmap: the unmap operation for the original existing mapping
>> +	 */
>> +	struct drm_gpuva_op_unmap *unmap;
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_op - GPU VA operation
>> + *
>> + * This structure represents a single generic operation, which can be either
>> + * map, unmap or remap.
>> + *
>> + * The particular type of the operation is defined by @op.
>> + */
>> +struct drm_gpuva_op {
>> +	/**
>> +	 * @entry:
>> +	 *
>> +	 * The &list_head used to distribute instances of this struct within
>> +	 * &drm_gpuva_ops.
>> +	 */
>> +	struct list_head entry;
>> +
>> +	/**
>> +	 * @op: the type of the operation
>> +	 */
>> +	enum drm_gpuva_op_type op;
>> +
>> +	union {
>> +		/**
>> +		 * @map: the map operation
>> +		 */
>> +		struct drm_gpuva_op_map map;
>> +
>> +		/**
>> +		 * @unmap: the unmap operation
>> +		 */
>> +		struct drm_gpuva_op_unmap unmap;
>> +
>> +		/**
>> +		 * @remap: the remap operation
>> +		 */
>> +		struct drm_gpuva_op_remap remap;
>> +	};
>> +};
>> +
>> +/**
>> + * struct drm_gpuva_ops - wraps a list of &drm_gpuva_op
>> + */
>> +struct drm_gpuva_ops {
>> +	/**
>> +	 * @list: the &list_head
>> +	 */
>> +	struct list_head list;
>> +};
>> +
>> +/**
>> + * drm_gpuva_for_each_op - iterator to walk over all ops
>> + * @op: &drm_gpuva_op to assign in each iteration step
>> + * @ops: &drm_gpuva_ops to walk
>> + *
>> + * This iterator walks over all ops within a given list of operations.
>> + */
>> +#define drm_gpuva_for_each_op(op, ops) list_for_each_entry(op, &(ops)->list, entry)
>> +
>> +/**
>> + * drm_gpuva_for_each_op_safe - iterator to safely walk over all ops
>> + * @op: &drm_gpuva_op to assign in each iteration step
>> + * @next: &next &drm_gpuva_op to store the next step
>> + * @ops: &drm_gpuva_ops to walk
>> + *
>> + * This iterator walks over all ops within a given list of operations. It is
>> + * implemented with list_for_each_safe(), so save against removal of elements.
>> + */
>> +#define drm_gpuva_for_each_op_safe(op, next, ops) \
>> +	list_for_each_entry_safe(op, next, &(ops)->list, entry)
>> +
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_map_ops_create(struct drm_gpuva_manager *mgr,
>> +			    u64 addr, u64 range,
>> +			    struct drm_gem_object *obj, u64 offset);
>> +struct drm_gpuva_ops *
>> +drm_gpuva_sm_unmap_ops_create(struct drm_gpuva_manager *mgr,
>> +			      u64 addr, u64 range);
>> +void drm_gpuva_ops_free(struct drm_gpuva_ops *ops);
>> +
>> +#endif /* __DRM_GPUVA_MGR_H__ */
>> -- 
>> 2.39.0
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI
  2023-02-06 14:48                       ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
@ 2023-03-16 16:39                         ` Danilo Krummrich
  0 siblings, 0 replies; 75+ messages in thread
From: Danilo Krummrich @ 2023-03-16 16:39 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: jason, corbet, nouveau, linux-doc, linux-kernel, dri-devel,
	bskeggs, tzimmermann, airlied, Christian König,
	Matthew Brost

Hi Oded,

sorry for the late response, somehow this mail slipped through.

On 2/6/23 15:48, Oded Gabbay wrote:
> On Thu, Jan 19, 2023 at 7:24 AM Matthew Brost <matthew.brost@intel.com> wrote:
>> Is this not an application issue? Millions of mappings seems a bit
>> absurd to me.
> If I look at the most extreme case for AI, assuming 256GB of HBM
> memory and page mapping of 2MB, we get to 128K of mappings. But that's
> really the extreme case imo. I assume most mappings will be much
> larger. In fact, in the most realistic scenario of large-scale
> training, a single user will probably map the entire HBM memory using
> 1GB pages.
> 
> I have also a question, could this GPUVA code manage VA ranges
> mappings for userptr mappings, assuming we work without svm/uva/usm
> (pointer-is-a-pointer) ? Because then we are talking about possible
> 4KB mappings of 1 - 1.5 TB host server RAM (Implied in my question is
> the assumption this can be used also for non-VK use-cases. Please tell
> me if I'm totally wrong here).

In V2 I switched from drm_mm to maple tree, which should improve 
handling of lots of entries. I also dropped the requirement for GPUVA 
entries to be backed by a valid GEM object.

I think it can be used for non-VK use-cases. It basically just keeps 
track of mappings (not allocating them in the sense of finding a hole 
and providing a base address for a given size). There are basic 
functions to insert and remove entries. For those basic functions it is 
ensured that colliding entries can't be inserted and only a specific 
given entry can be removed, rather than e.g. an arbitrary range.

There are also more advanced functions where users of the GPUVA manager 
can request to "force map" a new mapping and to unmap a given range. The 
GPUVA manager will figure out the (sub-)operations to make this happen 
(.e.g. remove mappings in the way, split up mappings, etc.) and either 
provide these operations (or steps) through callbacks or though a list 
of operations to the caller to process them.

Are there any other use-cases or features you could think of that would 
be beneficial for accelerators?

- Danilo

> 
> Thanks,
> Oded
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2023-03-16 16:41 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
2023-01-18  8:51   ` Christian König
2023-01-18 19:00     ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
2023-01-19  4:14   ` Bagas Sanjaya
2023-01-20 18:32     ` Danilo Krummrich
2023-01-23 23:23   ` Niranjana Vishwanathapura
2023-01-26 23:43   ` Matthew Brost
2023-01-27  0:24   ` Matthew Brost
2023-02-03 17:37   ` Matthew Brost
2023-02-06 13:35     ` Christian König
2023-02-06 13:46       ` Danilo Krummrich
2023-02-14 11:52     ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
2023-01-18 13:55   ` kernel test robot
2023-01-18 15:47   ` kernel test robot
2023-01-18  6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-01-27  1:05   ` Matthew Brost
2023-01-27  1:26     ` Danilo Krummrich
2023-01-27  7:55       ` Christian König
2023-01-27 13:12         ` Danilo Krummrich
2023-01-27 13:23           ` Christian König
2023-01-27 14:44             ` Danilo Krummrich
2023-01-27 15:17               ` Christian König
2023-01-27 20:25                 ` David Airlie
2023-01-30 12:58                   ` Christian König
2023-01-27 21:09                 ` Danilo Krummrich
2023-01-29 18:46                   ` Danilo Krummrich
2023-01-30 13:02                     ` Christian König
2023-01-30 23:38                       ` Danilo Krummrich
2023-02-01  8:10                       ` [Nouveau] " Dave Airlie
2023-02-02 11:53                         ` Christian König
2023-02-02 18:31                           ` Danilo Krummrich
2023-02-06  9:48                             ` Christian König
2023-02-06 13:27                               ` Danilo Krummrich
2023-02-06 16:14                                 ` Christian König
2023-02-06 18:20                                   ` Danilo Krummrich
2023-02-07  9:35                                     ` Christian König
2023-02-07 10:50                                       ` Danilo Krummrich
2023-02-10 11:50                                         ` Christian König
2023-02-10 12:47                                           ` Danilo Krummrich
2023-01-27  1:43     ` Danilo Krummrich
2023-01-27  3:21       ` Matthew Brost
2023-01-27  3:33         ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-01-20  3:37   ` kernel test robot
2023-01-18  6:12 ` [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
2023-01-18 20:37   ` Thomas Hellström (Intel)
2023-01-19  3:44     ` Danilo Krummrich
2023-01-19  4:58       ` Matthew Brost
2023-01-19  7:32         ` Thomas Hellström (Intel)
2023-01-20 10:08         ` Boris Brezillon
2023-01-18  6:12 ` [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-01-18  8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
2023-01-18 15:34   ` Danilo Krummrich
2023-01-18 15:37     ` Christian König
2023-01-18 16:19       ` Danilo Krummrich
2023-01-18 16:30         ` Alex Deucher
2023-01-18 16:50           ` Danilo Krummrich
2023-01-18 16:54             ` Alex Deucher
2023-01-18 19:17               ` Dave Airlie
2023-01-18 19:48                 ` Christian König
2023-01-19  4:04                   ` Danilo Krummrich
2023-01-19  5:23                     ` Matthew Brost
2023-01-19 11:33                       ` drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI) Christian König
2023-02-06 14:48                       ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
2023-03-16 16:39                         ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).