All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Common DRM execution context v2
@ 2022-05-04  7:47 Christian König
  2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel

Hello everyone,

compared to the earlier version this has seen quite a bit of additional
testing and can now handle both amdgpu as well as radeon without any
performance drop.

QXL is converted over as well and then the remaining ttm_execbuf_util
implementation moved into VMWGFX which is the only remaining user of
this.

Please review and comment,
Christian.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-09 14:31     ` Daniel Vetter
  2022-06-09  0:05   ` Felix Kuehling
  2022-05-04  7:47 ` [PATCH 2/8] drm: add drm_exec selftests Christian König
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measureable.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   7 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 295 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 144 +++++++++++++++++
 5 files changed, 460 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index f32ccce5722d..bf7dd2a78e9b 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e88c497fa010..1b35c10df263 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -179,6 +179,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
@@ -252,6 +258,7 @@ config DRM_AMDGPU
 	select DRM_SCHED
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select POWER_SUPPLY
 	select HWMON
 	select BACKLIGHT_CLASS_DEVICE
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 15fe3163f822..ee8573b683f3 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -37,6 +37,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..ed2106c22786
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,295 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_lock(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Initialize the drm_exec_objects container */
+static void drm_exec_objects_init(struct drm_exec_objects *container)
+{
+	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	container->max_objects = container->objects ?
+		PAGE_SIZE / sizeof(void *) : 0;
+	container->num_objects = 0;
+}
+
+/* Cleanup the drm_exec_objects container */
+static void drm_exec_objects_fini(struct drm_exec_objects *container)
+{
+	kvfree(container->objects);
+}
+
+/* Make sure we have enough room and add an object the container */
+static int drm_exec_objects_add(struct drm_exec_objects *container,
+				struct drm_gem_object *obj)
+{
+	if (unlikely(container->num_objects == container->max_objects)) {
+		size_t size = container->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		container->objects = tmp;
+		container->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	container->objects[container->num_objects++] = obj;
+	return 0;
+}
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_duplicate_object(exec, index, obj)
+		drm_gem_object_put(obj);
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked and duplicate
+ * objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	drm_exec_objects_init(&exec->locked);
+	drm_exec_objects_init(&exec->duplicates);
+	exec->contended = DRM_EXEC_DUMMY;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finilize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	drm_exec_objects_fini(&exec->locked);
+	drm_exec_objects_fini(&exec->duplicates);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->locked.num_objects = 0;
+	exec->duplicates.num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the xa and reserve fences */
+static int drm_exec_obj_locked(struct drm_exec_objects *container,
+			       struct drm_gem_object *obj,
+			       unsigned int num_fences)
+{
+	int ret;
+
+	if (container) {
+		ret = drm_exec_objects_add(container, obj);
+		if (ret)
+			return ret;
+	}
+
+	if (num_fences) {
+		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		if (ret)
+			goto error_erase;
+	}
+
+	return 0;
+
+error_erase:
+	if (container) {
+		--container->num_objects;
+		drm_gem_object_put(obj);
+	}
+	return ret;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
+	if (unlikely(ret))
+		dma_resv_unlock(obj->resv);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * succesfully locked objects are put into the locked container. Duplicates
+ * detected as well and automatically moved into the duplicates container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY)) {
+		struct drm_exec_objects *container = &exec->duplicates;
+
+		/*
+		 * If this is the first locked GEM object it was most likely
+		 * just contended. So don't add it to the duplicates, just
+		 * reserve the fence slots.
+		 */
+		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
+			container = NULL;
+
+		ret = drm_exec_obj_locked(container, obj, num_fences);
+		if (ret)
+			return ret;
+
+	} else if (unlikely(ret)) {
+		return ret;
+
+	} else {
+		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
+		if (ret)
+			goto error_unlock;
+	}
+
+	drm_gem_object_get(obj);
+	return 0;
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..f73981c6292e
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec_objects - Container for GEM objects in a drm_exec
+ */
+struct drm_exec_objects {
+	unsigned int		num_objects;
+	unsigned int		max_objects;
+	struct drm_gem_object	**objects;
+};
+
+/**
+ * drm_exec_objects_for_each - iterate over all the objects inside the container
+ */
+#define drm_exec_objects_for_each(array, index, obj)		\
+	for (index = 0, obj = (array)->objects[0];		\
+	     index < (array)->num_objects;			\
+	     ++index, obj = (array)->objects[index])
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @locked: container for the locked GEM objects
+	 */
+	struct drm_exec_objects	locked;
+
+	/**
+	 * @duplicates: container for the duplicated GEM objects
+	 */
+	struct drm_exec_objects	duplicates;
+
+	/**
+	 * @contended: contended GEM object we backet of for.
+	 */
+	struct drm_gem_object	*contended;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->locked, index, obj)
+
+/**
+ * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the duplicate GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+/**
+ * drm_exec_has_duplicates - check for duplicated GEM object
+ * @exec: drm_exec object
+ *
+ * Return true if the drm_exec object has encountered some already locked GEM
+ * objects while trying to lock them. This can happen if multiple GEM objects
+ * share the same underlying resv object.
+ */
+static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
+{
+	return exec->duplicates.num_objects > 0;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/8] drm: add drm_exec selftests
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
  2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-04  7:47 ` [PATCH 3/8] drm/amdkfd: switch over to using drm_exec Christian König
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Largely just the initial skeleton.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/selftests/Makefile            |  2 +-
 .../gpu/drm/selftests/drm_exec_selftests.h    | 10 +++
 drivers/gpu/drm/selftests/test-drm_exec.c     | 74 +++++++++++++++++++
 3 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/selftests/drm_exec_selftests.h
 create mode 100644 drivers/gpu/drm/selftests/test-drm_exec.c

diff --git a/drivers/gpu/drm/selftests/Makefile b/drivers/gpu/drm/selftests/Makefile
index 5ba5f9138c95..0ab4b0f0642f 100644
--- a/drivers/gpu/drm/selftests/Makefile
+++ b/drivers/gpu/drm/selftests/Makefile
@@ -5,4 +5,4 @@ test-drm_modeset-y := test-drm_modeset_common.o test-drm_plane_helper.o \
 		      test-drm_rect.o
 
 obj-$(CONFIG_DRM_DEBUG_SELFTEST) += test-drm_mm.o test-drm_modeset.o test-drm_cmdline_parser.o \
-				    test-drm_buddy.o
+				    test-drm_buddy.o test-drm_exec.o
diff --git a/drivers/gpu/drm/selftests/drm_exec_selftests.h b/drivers/gpu/drm/selftests/drm_exec_selftests.h
new file mode 100644
index 000000000000..d88ec9c85fe6
--- /dev/null
+++ b/drivers/gpu/drm/selftests/drm_exec_selftests.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* List each unit test as selftest(name, function)
+ *
+ * The name is used as both an enum and expanded as igt__name to create
+ * a module parameter. It must be unique and legal for a C identifier.
+ *
+ * Tests are executed in order by igt/drm_exec
+ */
+selftest(sanitycheck, igt_sanitycheck) /* keep first (selfcheck for igt) */
+selftest(exec_lock1, igt_exec_lock1)
diff --git a/drivers/gpu/drm/selftests/test-drm_exec.c b/drivers/gpu/drm/selftests/test-drm_exec.c
new file mode 100644
index 000000000000..de2c3d986828
--- /dev/null
+++ b/drivers/gpu/drm/selftests/test-drm_exec.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#define pr_fmt(fmt) "drm_exec: " fmt
+
+#include <linux/module.h>
+#include <linux/prime_numbers.h>
+
+#include <drm/drm_exec.h>
+#include <drm/drm_device.h>
+#include <drm/drm_gem.h>
+
+#include "../lib/drm_random.h"
+
+#define TESTS "drm_exec_selftests.h"
+#include "drm_selftest.h"
+
+static	struct drm_device dev;
+
+static int igt_sanitycheck(void *ignored)
+{
+	struct drm_exec exec;
+
+	drm_exec_init(&exec, true);
+	drm_exec_fini(&exec);
+	pr_info("%s - ok!\n", __func__);
+	return 0;
+}
+
+static int igt_exec_lock1(void *ignored)
+{
+	struct drm_gem_object gobj = { };
+	struct drm_exec exec;
+	int ret;
+
+	drm_gem_private_object_init(&dev, &gobj, PAGE_SIZE);
+
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		ret = drm_exec_prepare_obj(&exec, &gobj, 1);
+		drm_exec_continue_on_contention(&exec);
+		if (ret) {
+			drm_exec_fini(&exec);
+			pr_err("%s - err %d!\n", __func__, ret);
+			return ret;
+		}
+	}
+	drm_exec_fini(&exec);
+	pr_info("%s - ok!\n", __func__);
+	return 0;
+}
+
+#include "drm_selftest.c"
+
+static int __init test_drm_exec_init(void)
+{
+	int err;
+
+	err = run_selftests(selftests, ARRAY_SIZE(selftests), NULL);
+
+	return err > 0 ? 0 : err;
+}
+
+static void __exit test_drm_exec_exit(void)
+{
+}
+
+module_init(test_drm_exec_init);
+module_exit(test_drm_exec_exit);
+
+MODULE_AUTHOR("AMD");
+MODULE_LICENSE("GPL and additional rights");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/8] drm/amdkfd: switch over to using drm_exec
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
  2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
  2022-05-04  7:47 ` [PATCH 2/8] drm: add drm_exec selftests Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-06-09  0:04   ` Felix Kuehling
  2022-05-04  7:47 ` [PATCH 4/8] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Avoids quite a bit of logic and kmalloc overhead.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   5 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 303 +++++++-----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  14 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  32 +-
 5 files changed, 152 insertions(+), 205 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 4cb14c2fe53f..3f3a994c68e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -25,12 +25,12 @@
 #ifndef AMDGPU_AMDKFD_H_INCLUDED
 #define AMDGPU_AMDKFD_H_INCLUDED
 
+#include <linux/list.h>
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/kthread.h>
 #include <linux/workqueue.h>
 #include <kgd_kfd_interface.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 #include "amdgpu_sync.h"
 #include "amdgpu_vm.h"
 
@@ -66,8 +66,7 @@ struct kgd_mem {
 	struct dma_buf *dmabuf;
 	struct list_head attachments;
 	/* protected by amdkfd_process_info.lock */
-	struct ttm_validate_buffer validate_list;
-	struct ttm_validate_buffer resv_list;
+	struct list_head validate_list;
 	uint32_t domain;
 	unsigned int mapped_to_gpu_memory;
 	uint64_t va;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3dc5ab2764ff..64ac4f8f49be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -25,6 +25,8 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/task.h>
 
+#include <drm/drm_exec.h>
+
 #include "amdgpu_object.h"
 #include "amdgpu_gem.h"
 #include "amdgpu_vm.h"
@@ -770,28 +772,19 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
 				struct amdkfd_process_info *process_info,
 				bool userptr)
 {
-	struct ttm_validate_buffer *entry = &mem->validate_list;
-	struct amdgpu_bo *bo = mem->bo;
-
-	INIT_LIST_HEAD(&entry->head);
-	entry->num_shared = 1;
-	entry->bo = &bo->tbo;
-	mutex_lock(&process_info->lock);
 	if (userptr)
-		list_add_tail(&entry->head, &process_info->userptr_valid_list);
+		list_add_tail(&mem->validate_list,
+			      &process_info->userptr_valid_list);
 	else
-		list_add_tail(&entry->head, &process_info->kfd_bo_list);
+		list_add_tail(&mem->validate_list, &process_info->kfd_bo_list);
 	mutex_unlock(&process_info->lock);
 }
 
 static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem *mem,
 		struct amdkfd_process_info *process_info)
 {
-	struct ttm_validate_buffer *bo_list_entry;
-
-	bo_list_entry = &mem->validate_list;
 	mutex_lock(&process_info->lock);
-	list_del(&bo_list_entry->head);
+	list_del(&mem->validate_list);
 	mutex_unlock(&process_info->lock);
 }
 
@@ -875,13 +868,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
  * object can track VM updates.
  */
 struct bo_vm_reservation_context {
-	struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
-	unsigned int n_vms;		    /* Number of VMs reserved	    */
-	struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
-	struct ww_acquire_ctx ticket;	    /* Reservation ticket	    */
-	struct list_head list, duplicates;  /* BO lists			    */
-	struct amdgpu_sync *sync;	    /* Pointer to sync object	    */
-	bool reserved;			    /* Whether BOs are reserved	    */
+	/* DRM execution context for the reservation */
+	struct drm_exec exec;
+	/* Number of VMs reserved */
+	unsigned int n_vms;
+	/* Pointer to sync object */
+	struct amdgpu_sync *sync;
 };
 
 enum bo_vm_match {
@@ -905,35 +897,24 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 
 	WARN_ON(!vm);
 
-	ctx->reserved = false;
 	ctx->n_vms = 1;
 	ctx->sync = &mem->sync;
-
-	INIT_LIST_HEAD(&ctx->list);
-	INIT_LIST_HEAD(&ctx->duplicates);
-
-	ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
-	if (!ctx->vm_pd)
-		return -ENOMEM;
-
-	ctx->kfd_bo.priority = 0;
-	ctx->kfd_bo.tv.bo = &bo->tbo;
-	ctx->kfd_bo.tv.num_shared = 1;
-	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
-
-	amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
-
-	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
-				     false, &ctx->duplicates);
-	if (ret) {
-		pr_err("Failed to reserve buffers in ttm.\n");
-		kfree(ctx->vm_pd);
-		ctx->vm_pd = NULL;
-		return ret;
+	drm_exec_init(&ctx->exec, true);
+	drm_exec_while_not_all_locked(&ctx->exec) {
+		ret = amdgpu_vm_lock_pd(vm, &ctx->exec);
+		if (likely(!ret))
+			ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base,
+						   0);
+		drm_exec_continue_on_contention(&ctx->exec);
+		if (unlikely(ret))
+			goto error;
 	}
-
-	ctx->reserved = true;
 	return 0;
+
+error:
+	pr_err("Failed to reserve buffers in ttm.\n");
+	drm_exec_fini(&ctx->exec);
+	return ret;
 }
 
 /**
@@ -950,63 +931,39 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 				struct amdgpu_vm *vm, enum bo_vm_match map_type,
 				struct bo_vm_reservation_context *ctx)
 {
-	struct amdgpu_bo *bo = mem->bo;
 	struct kfd_mem_attachment *entry;
-	unsigned int i;
+	struct amdgpu_bo *bo = mem->bo;
 	int ret;
 
-	ctx->reserved = false;
-	ctx->n_vms = 0;
-	ctx->vm_pd = NULL;
 	ctx->sync = &mem->sync;
+	drm_exec_init(&ctx->exec, true);
+	drm_exec_while_not_all_locked(&ctx->exec) {
+		ctx->n_vms = 0;
+		list_for_each_entry(entry, &mem->attachments, list) {
+			if ((vm && vm != entry->bo_va->base.vm) ||
+				(entry->is_mapped != map_type
+				&& map_type != BO_VM_ALL))
+				continue;
 
-	INIT_LIST_HEAD(&ctx->list);
-	INIT_LIST_HEAD(&ctx->duplicates);
-
-	list_for_each_entry(entry, &mem->attachments, list) {
-		if ((vm && vm != entry->bo_va->base.vm) ||
-			(entry->is_mapped != map_type
-			&& map_type != BO_VM_ALL))
-			continue;
-
-		ctx->n_vms++;
-	}
-
-	if (ctx->n_vms != 0) {
-		ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
-				     GFP_KERNEL);
-		if (!ctx->vm_pd)
-			return -ENOMEM;
-	}
-
-	ctx->kfd_bo.priority = 0;
-	ctx->kfd_bo.tv.bo = &bo->tbo;
-	ctx->kfd_bo.tv.num_shared = 1;
-	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
-
-	i = 0;
-	list_for_each_entry(entry, &mem->attachments, list) {
-		if ((vm && vm != entry->bo_va->base.vm) ||
-			(entry->is_mapped != map_type
-			&& map_type != BO_VM_ALL))
-			continue;
-
-		amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
-				&ctx->vm_pd[i]);
-		i++;
-	}
+			ret = amdgpu_vm_lock_pd(vm, &ctx->exec);
+			drm_exec_break_on_contention(&ctx->exec);
+			if (unlikely(ret))
+				goto error;
+			++ctx->n_vms;
+		}
+		drm_exec_continue_on_contention(&ctx->exec);
 
-	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
-				     false, &ctx->duplicates);
-	if (ret) {
-		pr_err("Failed to reserve buffers in ttm.\n");
-		kfree(ctx->vm_pd);
-		ctx->vm_pd = NULL;
-		return ret;
+		ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base, 1);
+		drm_exec_continue_on_contention(&ctx->exec);
+		if (unlikely(ret))
+			goto error;
 	}
-
-	ctx->reserved = true;
 	return 0;
+
+error:
+	pr_err("Failed to reserve buffers in ttm.\n");
+	drm_exec_fini(&ctx->exec);
+	return ret;
 }
 
 /**
@@ -1027,15 +984,8 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
 	if (wait)
 		ret = amdgpu_sync_wait(ctx->sync, intr);
 
-	if (ctx->reserved)
-		ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
-	kfree(ctx->vm_pd);
-
+	drm_exec_fini(&ctx->exec);
 	ctx->sync = NULL;
-
-	ctx->reserved = false;
-	ctx->vm_pd = NULL;
-
 	return ret;
 }
 
@@ -1616,7 +1566,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	unsigned long bo_size = mem->bo->tbo.base.size;
 	struct kfd_mem_attachment *entry, *tmp;
 	struct bo_vm_reservation_context ctx;
-	struct ttm_validate_buffer *bo_list_entry;
 	unsigned int mapped_to_gpu_memory;
 	int ret;
 	bool is_imported = false;
@@ -1644,9 +1593,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	}
 
 	/* Make sure restore workers don't access the BO any more */
-	bo_list_entry = &mem->validate_list;
 	mutex_lock(&process_info->lock);
-	list_del(&bo_list_entry->head);
+	list_del(&mem->validate_list);
 	mutex_unlock(&process_info->lock);
 
 	/* No more MMU notifiers */
@@ -1945,7 +1893,7 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
 
 	amdgpu_amdkfd_remove_eviction_fence(
 		bo, mem->process_info->eviction_fence);
-	list_del_init(&mem->validate_list.head);
+	list_del_init(&mem->validate_list);
 
 	if (size)
 		*size = amdgpu_bo_size(bo);
@@ -2107,7 +2055,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
 	 */
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_valid_list,
-				 validate_list.head) {
+				 validate_list) {
 		if (!atomic_read(&mem->invalid))
 			continue; /* BO is still valid */
 
@@ -2124,7 +2072,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
 			return -EAGAIN;
 		}
 
-		list_move_tail(&mem->validate_list.head,
+		list_move_tail(&mem->validate_list,
 			       &process_info->userptr_inval_list);
 	}
 
@@ -2133,7 +2081,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
 
 	/* Go through userptr_inval_list and update any invalid user_pages */
 	list_for_each_entry(mem, &process_info->userptr_inval_list,
-			    validate_list.head) {
+			    validate_list) {
 		invalid = atomic_read(&mem->invalid);
 		if (!invalid)
 			/* BO hasn't been invalidated since the last
@@ -2184,50 +2132,44 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
  */
 static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 {
-	struct amdgpu_bo_list_entry *pd_bo_list_entries;
-	struct list_head resv_list, duplicates;
-	struct ww_acquire_ctx ticket;
+	struct ttm_operation_ctx ctx = { false, false };
 	struct amdgpu_sync sync;
+	struct drm_exec exec;
 
 	struct amdgpu_vm *peer_vm;
 	struct kgd_mem *mem, *tmp_mem;
 	struct amdgpu_bo *bo;
-	struct ttm_operation_ctx ctx = { false, false };
-	int i, ret;
-
-	pd_bo_list_entries = kcalloc(process_info->n_vms,
-				     sizeof(struct amdgpu_bo_list_entry),
-				     GFP_KERNEL);
-	if (!pd_bo_list_entries) {
-		pr_err("%s: Failed to allocate PD BO list entries\n", __func__);
-		ret = -ENOMEM;
-		goto out_no_mem;
-	}
-
-	INIT_LIST_HEAD(&resv_list);
-	INIT_LIST_HEAD(&duplicates);
+	int ret;
 
-	/* Get all the page directory BOs that need to be reserved */
-	i = 0;
-	list_for_each_entry(peer_vm, &process_info->vm_list_head,
-			    vm_list_node)
-		amdgpu_vm_get_pd_bo(peer_vm, &resv_list,
-				    &pd_bo_list_entries[i++]);
-	/* Add the userptr_inval_list entries to resv_list */
-	list_for_each_entry(mem, &process_info->userptr_inval_list,
-			    validate_list.head) {
-		list_add_tail(&mem->resv_list.head, &resv_list);
-		mem->resv_list.bo = mem->validate_list.bo;
-		mem->resv_list.num_shared = mem->validate_list.num_shared;
-	}
+	amdgpu_sync_create(&sync);
 
+	drm_exec_init(&exec, true);
 	/* Reserve all BOs and page tables for validation */
-	ret = ttm_eu_reserve_buffers(&ticket, &resv_list, false, &duplicates);
-	WARN(!list_empty(&duplicates), "Duplicates should be empty");
-	if (ret)
-		goto out_free;
+	drm_exec_while_not_all_locked(&exec) {
+		/* Reserve all the page directories */
+		list_for_each_entry(peer_vm, &process_info->vm_list_head,
+				    vm_list_node) {
+			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto unreserve_out;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	amdgpu_sync_create(&sync);
+		/* Reserve the userptr_inval_list entries to resv_list */
+		list_for_each_entry(mem, &process_info->userptr_inval_list,
+				    validate_list) {
+			struct drm_gem_object *gobj;
+
+			gobj = &mem->bo->tbo.base;
+			ret = drm_exec_prepare_obj(&exec, gobj, 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto unreserve_out;
+		}
+		drm_exec_continue_on_contention(&exec);
+	}
+	WARN(!drm_exec_has_duplicates(&exec), "Duplicates should be empty");
 
 	ret = process_validate_vms(process_info);
 	if (ret)
@@ -2236,7 +2178,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	/* Validate BOs and update GPUVM page tables */
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_inval_list,
-				 validate_list.head) {
+				 validate_list) {
 		struct kfd_mem_attachment *attachment;
 
 		bo = mem->bo;
@@ -2251,7 +2193,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 			}
 		}
 
-		list_move_tail(&mem->validate_list.head,
+		list_move_tail(&mem->validate_list,
 			       &process_info->userptr_valid_list);
 
 		/* Update mapping. If the BO was not validated
@@ -2279,12 +2221,9 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	ret = process_update_pds(process_info, &sync);
 
 unreserve_out:
-	ttm_eu_backoff_reservation(&ticket, &resv_list);
+	drm_exec_fini(&exec);
 	amdgpu_sync_wait(&sync, false);
 	amdgpu_sync_free(&sync);
-out_free:
-	kfree(pd_bo_list_entries);
-out_no_mem:
 
 	return ret;
 }
@@ -2381,50 +2320,46 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
  */
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 {
-	struct amdgpu_bo_list_entry *pd_bo_list;
 	struct amdkfd_process_info *process_info = info;
 	struct amdgpu_vm *peer_vm;
 	struct kgd_mem *mem;
-	struct bo_vm_reservation_context ctx;
 	struct amdgpu_amdkfd_fence *new_fence;
-	int ret = 0, i;
 	struct list_head duplicate_save;
 	struct amdgpu_sync sync_obj;
 	unsigned long failed_size = 0;
 	unsigned long total_size = 0;
+	struct drm_exec exec;
+	int ret;
 
 	INIT_LIST_HEAD(&duplicate_save);
-	INIT_LIST_HEAD(&ctx.list);
-	INIT_LIST_HEAD(&ctx.duplicates);
 
-	pd_bo_list = kcalloc(process_info->n_vms,
-			     sizeof(struct amdgpu_bo_list_entry),
-			     GFP_KERNEL);
-	if (!pd_bo_list)
-		return -ENOMEM;
-
-	i = 0;
 	mutex_lock(&process_info->lock);
-	list_for_each_entry(peer_vm, &process_info->vm_list_head,
-			vm_list_node)
-		amdgpu_vm_get_pd_bo(peer_vm, &ctx.list, &pd_bo_list[i++]);
 
-	/* Reserve all BOs and page tables/directory. Add all BOs from
-	 * kfd_bo_list to ctx.list
-	 */
-	list_for_each_entry(mem, &process_info->kfd_bo_list,
-			    validate_list.head) {
-
-		list_add_tail(&mem->resv_list.head, &ctx.list);
-		mem->resv_list.bo = mem->validate_list.bo;
-		mem->resv_list.num_shared = mem->validate_list.num_shared;
-	}
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		list_for_each_entry(peer_vm, &process_info->vm_list_head,
+				    vm_list_node) {
+			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto ttm_reserve_fail;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
-				     false, &duplicate_save);
-	if (ret) {
-		pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
-		goto ttm_reserve_fail;
+		/* Reserve all BOs and page tables/directory. Add all BOs from
+		 * kfd_bo_list to ctx.list
+		 */
+		list_for_each_entry(mem, &process_info->kfd_bo_list,
+				    validate_list) {
+			struct drm_gem_object *gobj;
+
+			gobj = &mem->bo->tbo.base;
+			ret = drm_exec_prepare_obj(&exec, gobj, 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto ttm_reserve_fail;
+		}
+		drm_exec_continue_on_contention(&exec);
 	}
 
 	amdgpu_sync_create(&sync_obj);
@@ -2442,7 +2377,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 
 	/* Validate BOs and map them to GPUVM (update VM page tables). */
 	list_for_each_entry(mem, &process_info->kfd_bo_list,
-			    validate_list.head) {
+			    validate_list) {
 
 		struct amdgpu_bo *bo = mem->bo;
 		uint32_t domain = mem->domain;
@@ -2515,8 +2450,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	*ef = dma_fence_get(&new_fence->base);
 
 	/* Attach new eviction fence to all BOs */
-	list_for_each_entry(mem, &process_info->kfd_bo_list,
-		validate_list.head)
+	list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list)
 		amdgpu_bo_fence(mem->bo,
 			&process_info->eviction_fence->base, true);
 
@@ -2529,11 +2463,10 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	}
 
 validate_map_fail:
-	ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
 	amdgpu_sync_free(&sync_obj);
 ttm_reserve_fail:
+	drm_exec_fini(&exec);
 	mutex_unlock(&process_info->lock);
-	kfree(pd_bo_list);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5277c10d901d..c82c580f1df5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -33,6 +33,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_exec.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -639,6 +640,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 	list_add(&entry->tv.head, validated);
 }
 
+/**
+ * amdgpu_vm_lock_pd - lock PD in drm_exec
+ *
+ * @vm: vm providing the BOs
+ * @exec: drm execution context
+ *
+ * Lock the VM root PD in the DRM execution context.
+ */
+int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec)
+{
+	return drm_exec_prepare_obj(exec, &vm->root.bo->tbo.base, 4);
+}
+
 /**
  * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index bd7892482bbf..15d26f442e70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -36,6 +36,8 @@
 #include "amdgpu_ring.h"
 #include "amdgpu_ids.h"
 
+struct drm_exec;
+
 struct amdgpu_bo_va;
 struct amdgpu_job;
 struct amdgpu_bo_list_entry;
@@ -383,6 +385,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 			 struct list_head *validated,
 			 struct amdgpu_bo_list_entry *entry);
+int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec);
 bool amdgpu_vm_ready(struct amdgpu_vm *vm);
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			      int (*callback)(void *p, struct amdgpu_bo *bo),
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index b3fc3e958227..b5cb234e9f77 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -23,6 +23,8 @@
 
 #include <linux/types.h>
 #include <linux/sched/task.h>
+#include <drm/drm_exec.h>
+
 #include "amdgpu_sync.h"
 #include "amdgpu_object.h"
 #include "amdgpu_vm.h"
@@ -1373,9 +1375,7 @@ struct svm_validate_context {
 	struct svm_range *prange;
 	bool intr;
 	unsigned long bitmap[MAX_GPU_INSTANCE];
-	struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
-	struct list_head validate_list;
-	struct ww_acquire_ctx ticket;
+	struct drm_exec exec;
 };
 
 static int svm_range_reserve_bos(struct svm_validate_context *ctx)
@@ -1385,25 +1385,23 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
 	uint32_t gpuidx;
 	int r;
 
-	INIT_LIST_HEAD(&ctx->validate_list);
+	drm_exec_init(&ctx->exec, true);
 	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
 		pdd = kfd_process_device_from_gpuidx(ctx->process, gpuidx);
 		if (!pdd) {
 			pr_debug("failed to find device idx %d\n", gpuidx);
-			return -EINVAL;
+			r = -EINVAL;
+			goto unreserve_out;
 		}
 		vm = drm_priv_to_vm(pdd->drm_priv);
 
-		ctx->tv[gpuidx].bo = &vm->root.bo->tbo;
-		ctx->tv[gpuidx].num_shared = 4;
-		list_add(&ctx->tv[gpuidx].head, &ctx->validate_list);
-	}
-
-	r = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->validate_list,
-				   ctx->intr, NULL);
-	if (r) {
-		pr_debug("failed %d to reserve bo\n", r);
-		return r;
+		r = amdgpu_vm_lock_pd(vm, &ctx->exec);
+		if (unlikely(r == -EDEADLK))
+			continue;
+		if (unlikely(r)) {
+			pr_debug("failed %d to reserve bo\n", r);
+			goto unreserve_out;
+		}
 	}
 
 	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
@@ -1426,13 +1424,13 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
 	return 0;
 
 unreserve_out:
-	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
+	drm_exec_fini(&ctx->exec);
 	return r;
 }
 
 static void svm_range_unreserve_bos(struct svm_validate_context *ctx)
 {
-	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
+	drm_exec_fini(&ctx->exec);
 }
 
 static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/8] drm/amdgpu: use drm_exec for GEM and CSA handling
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
                   ` (2 preceding siblings ...)
  2022-05-04  7:47 ` [PATCH 3/8] drm/amdkfd: switch over to using drm_exec Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-04  7:47 ` [PATCH 5/8] drm/amdgpu: use the new drm_exec object for CS Christian König
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Start using the new component here as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 42 ++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 77 +++++++++++--------------
 2 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index c6d4d41c4393..ea434c8de047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -22,6 +22,8 @@
  * * Author: Monk.liu@amd.com
  */
 
+#include <drm/drm_exec.h>
+
 #include "amdgpu.h"
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
@@ -65,31 +67,25 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			  struct amdgpu_bo *bo, struct amdgpu_bo_va **bo_va,
 			  uint64_t csa_addr, uint32_t size)
 {
-	struct ww_acquire_ctx ticket;
-	struct list_head list;
-	struct amdgpu_bo_list_entry pd;
-	struct ttm_validate_buffer csa_tv;
+	struct drm_exec exec;
 	int r;
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&csa_tv.head);
-	csa_tv.bo = &bo->tbo;
-	csa_tv.num_shared = 1;
-
-	list_add(&csa_tv.head, &list);
-	amdgpu_vm_get_pd_bo(vm, &list, &pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
-	if (r) {
-		DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
-		return r;
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		r = amdgpu_vm_lock_pd(vm, &exec);
+		if (likely(!r))
+			r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 0);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r)) {
+			DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
+			goto error;
+		}
 	}
 
 	*bo_va = amdgpu_vm_bo_add(adev, vm, bo);
 	if (!*bo_va) {
-		ttm_eu_backoff_reservation(&ticket, &list);
-		DRM_ERROR("failed to create bo_va for static CSA\n");
-		return -ENOMEM;
+		r = -ENOMEM;
+		goto error;
 	}
 
 	r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
@@ -99,10 +95,10 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r) {
 		DRM_ERROR("failed to do bo_map on static CSA, err=%d\n", r);
 		amdgpu_vm_bo_del(adev, *bo_va);
-		ttm_eu_backoff_reservation(&ticket, &list);
-		return r;
+		goto error;
 	}
 
-	ttm_eu_backoff_reservation(&ticket, &list);
-	return 0;
+error:
+	drm_exec_fini(&exec);
+	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 84a53758e18e..99d6d3ab452e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -33,6 +33,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_gem_ttm_helper.h>
 
 #include "amdgpu.h"
@@ -195,29 +196,23 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
 
-	struct amdgpu_bo_list_entry vm_pd;
-	struct list_head list, duplicates;
 	struct dma_fence *fence = NULL;
-	struct ttm_validate_buffer tv;
-	struct ww_acquire_ctx ticket;
 	struct amdgpu_bo_va *bo_va;
+	struct drm_exec exec;
 	long r;
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&duplicates);
-
-	tv.bo = &bo->tbo;
-	tv.num_shared = 2;
-	list_add(&tv.head, &list);
-
-	amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
-	if (r) {
-		dev_err(adev->dev, "leaking bo va because "
-			"we fail to reserve bo (%ld)\n", r);
-		return;
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 0);
+		if (likely(!r))
+			r = amdgpu_vm_lock_pd(vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r)) {
+			dev_err(adev->dev, "leaking bo va (%ld)\n", r);
+			goto out_unlock;
+		}
 	}
+
 	bo_va = amdgpu_vm_bo_find(vm, bo);
 	if (!bo_va || --bo_va->ref_count)
 		goto out_unlock;
@@ -227,6 +222,9 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 		goto out_unlock;
 
 	r = amdgpu_vm_clear_freed(adev, vm, &fence);
+	if (unlikely(r < 0))
+		dev_err(adev->dev, "failed to clear page "
+			"tables on GEM object close (%ld)\n", r);
 	if (r || !fence)
 		goto out_unlock;
 
@@ -234,10 +232,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 	dma_fence_put(fence);
 
 out_unlock:
-	if (unlikely(r < 0))
-		dev_err(adev->dev, "failed to clear page "
-			"tables on GEM object close (%ld)\n", r);
-	ttm_eu_backoff_reservation(&ticket, &list);
+	drm_exec_fini(&exec);
 }
 
 static int amdgpu_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
@@ -668,10 +663,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	struct amdgpu_bo *abo;
 	struct amdgpu_bo_va *bo_va;
-	struct amdgpu_bo_list_entry vm_pd;
-	struct ttm_validate_buffer tv;
-	struct ww_acquire_ctx ticket;
-	struct list_head list, duplicates;
+	struct drm_exec exec;
 	uint64_t va_flags;
 	uint64_t vm_size;
 	int r = 0;
@@ -721,36 +713,37 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&duplicates);
 	if ((args->operation != AMDGPU_VA_OP_CLEAR) &&
 	    !(args->flags & AMDGPU_VM_PAGE_PRT)) {
 		gobj = drm_gem_object_lookup(filp, args->handle);
 		if (gobj == NULL)
 			return -ENOENT;
 		abo = gem_to_amdgpu_bo(gobj);
-		tv.bo = &abo->tbo;
-		if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID)
-			tv.num_shared = 1;
-		else
-			tv.num_shared = 0;
-		list_add(&tv.head, &list);
 	} else {
 		gobj = NULL;
 		abo = NULL;
 	}
 
-	amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		if (gobj) {
+			r = drm_exec_prepare_obj(&exec, gobj, 0);
+			drm_exec_continue_on_contention(&exec);
+			if (unlikely(r))
+				goto error;
+		}
 
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, &duplicates);
-	if (r)
-		goto error_unref;
+		r = amdgpu_vm_lock_pd(&fpriv->vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r))
+			goto error;
+	}
 
 	if (abo) {
 		bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
 		if (!bo_va) {
 			r = -ENOENT;
-			goto error_backoff;
+			goto error;
 		}
 	} else if (args->operation != AMDGPU_VA_OP_CLEAR) {
 		bo_va = fpriv->prt_va;
@@ -787,10 +780,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
 					args->operation);
 
-error_backoff:
-	ttm_eu_backoff_reservation(&ticket, &list);
-
-error_unref:
+error:
+	drm_exec_fini(&exec);
 	drm_gem_object_put(gobj);
 	return r;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/8] drm/amdgpu: use the new drm_exec object for CS
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
                   ` (3 preceding siblings ...)
  2022-05-04  7:47 ` [PATCH 4/8] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-04  7:47 ` [PATCH 6/8] drm/radeon: switch over to drm_exec Christian König
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Use the new component here as well and remove the old handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  70 ++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 236 ++++++++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
 7 files changed, 126 insertions(+), 221 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cdf0818088b3..08aae66557dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -55,7 +55,6 @@
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_gem.h>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 714178f1b6c6..cdd0f9995496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -28,6 +28,7 @@
  *    Christian König <deathsimple@vodafone.de>
  */
 
+#include <linux/sort.h>
 #include <linux/uaccess.h>
 
 #include "amdgpu.h"
@@ -50,13 +51,20 @@ static void amdgpu_bo_list_free(struct kref *ref)
 						   refcount);
 	struct amdgpu_bo_list_entry *e;
 
-	amdgpu_bo_list_for_each_entry(e, list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	amdgpu_bo_list_for_each_entry(e, list)
+		amdgpu_bo_unref(&e->bo);
+	call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);
+}
 
-		amdgpu_bo_unref(&bo);
-	}
+static int amdgpu_bo_list_entry_cmp(const void *_a, const void *_b)
+{
+	const struct amdgpu_bo_list_entry *a = _a, *b = _b;
 
-	call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);
+	if (a->priority > b->priority)
+		return 1;
+	if (a->priority < b->priority)
+		return -1;
+	return 0;
 }
 
 int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
@@ -118,7 +126,7 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
-		entry->tv.bo = &bo->tbo;
+		entry->bo = bo;
 
 		if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
 			list->gds_obj = bo;
@@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 
 	list->first_userptr = first_userptr;
 	list->num_entries = num_entries;
+	sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
+	     amdgpu_bo_list_entry_cmp, NULL);
 
 	trace_amdgpu_cs_bo_status(list->num_entries, total_size);
 
@@ -140,16 +150,10 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 	return 0;
 
 error_free:
-	for (i = 0; i < last_entry; ++i) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
-
-		amdgpu_bo_unref(&bo);
-	}
-	for (i = first_userptr; i < num_entries; ++i) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
-
-		amdgpu_bo_unref(&bo);
-	}
+	for (i = 0; i < last_entry; ++i)
+		amdgpu_bo_unref(&array[i].bo);
+	for (i = first_userptr; i < num_entries; ++i)
+		amdgpu_bo_unref(&array[i].bo);
 	kvfree(list);
 	return r;
 
@@ -181,40 +185,6 @@ int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,
 	return -ENOENT;
 }
 
-void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
-			     struct list_head *validated)
-{
-	/* This is based on the bucket sort with O(n) time complexity.
-	 * An item with priority "i" is added to bucket[i]. The lists are then
-	 * concatenated in descending order.
-	 */
-	struct list_head bucket[AMDGPU_BO_LIST_NUM_BUCKETS];
-	struct amdgpu_bo_list_entry *e;
-	unsigned i;
-
-	for (i = 0; i < AMDGPU_BO_LIST_NUM_BUCKETS; i++)
-		INIT_LIST_HEAD(&bucket[i]);
-
-	/* Since buffers which appear sooner in the relocation list are
-	 * likely to be used more often than buffers which appear later
-	 * in the list, the sort mustn't change the ordering of buffers
-	 * with the same priority, i.e. it must be stable.
-	 */
-	amdgpu_bo_list_for_each_entry(e, list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-		unsigned priority = e->priority;
-
-		if (!bo->parent)
-			list_add_tail(&e->tv.head, &bucket[priority]);
-
-		e->user_pages = NULL;
-	}
-
-	/* Connect the sorted buckets in the output list. */
-	for (i = 0; i < AMDGPU_BO_LIST_NUM_BUCKETS; i++)
-		list_splice(&bucket[i], validated);
-}
-
 void amdgpu_bo_list_put(struct amdgpu_bo_list *list)
 {
 	kref_put(&list->refcount, amdgpu_bo_list_free);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 529d52a204cf..248d9dc6c1fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -23,16 +23,17 @@
 #ifndef __AMDGPU_BO_LIST_H__
 #define __AMDGPU_BO_LIST_H__
 
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/amdgpu_drm.h>
 
+struct drm_file;
+
 struct amdgpu_device;
 struct amdgpu_bo;
 struct amdgpu_bo_va;
 struct amdgpu_fpriv;
 
 struct amdgpu_bo_list_entry {
-	struct ttm_validate_buffer	tv;
+	struct amdgpu_bo		*bo;
 	struct amdgpu_bo_va		*bo_va;
 	uint32_t			priority;
 	struct page			**user_pages;
@@ -51,8 +52,6 @@ struct amdgpu_bo_list {
 
 int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,
 		       struct amdgpu_bo_list **result);
-void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
-			     struct list_head *validated);
 void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
 int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
 				      struct drm_amdgpu_bo_list_entry **info_param);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8de283997769..a28b7947a034 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -44,7 +44,6 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,
 				      uint32_t *offset)
 {
 	struct drm_gem_object *gobj;
-	struct amdgpu_bo *bo;
 	unsigned long size;
 	int r;
 
@@ -52,21 +51,16 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,
 	if (gobj == NULL)
 		return -EINVAL;
 
-	bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
-	p->uf_entry.priority = 0;
-	p->uf_entry.tv.bo = &bo->tbo;
-	/* One for TTM and two for the CS job */
-	p->uf_entry.tv.num_shared = 3;
-
+	p->uf_bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
 	drm_gem_object_put(gobj);
 
-	size = amdgpu_bo_size(bo);
+	size = amdgpu_bo_size(p->uf_bo);
 	if (size != PAGE_SIZE || (data->offset + 8) > size) {
 		r = -EINVAL;
 		goto error_unref;
 	}
 
-	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
+	if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm)) {
 		r = -EINVAL;
 		goto error_unref;
 	}
@@ -76,7 +70,7 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,
 	return 0;
 
 error_unref:
-	amdgpu_bo_unref(&bo);
+	amdgpu_bo_unref(&p->uf_bo);
 	return r;
 }
 
@@ -115,6 +109,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
 	int i;
 	int ret;
 
+	drm_exec_init(&p->exec, true);
+
 	if (cs->in.num_chunks == 0)
 		return 0;
 
@@ -235,7 +231,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
 		goto free_all_kdata;
 	}
 
-	if (p->uf_entry.tv.bo)
+	if (p->uf_bo)
 		p->job->uf_addr = uf_offset;
 	kvfree(chunk_array);
 
@@ -449,57 +445,20 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 	return r;
 }
 
-static int amdgpu_cs_list_validate(struct amdgpu_cs_parser *p,
-			    struct list_head *validated)
-{
-	struct ttm_operation_ctx ctx = { true, false };
-	struct amdgpu_bo_list_entry *lobj;
-	int r;
-
-	list_for_each_entry(lobj, validated, tv.head) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(lobj->tv.bo);
-		struct mm_struct *usermm;
-
-		usermm = amdgpu_ttm_tt_get_usermm(bo->tbo.ttm);
-		if (usermm && usermm != current->mm)
-			return -EPERM;
-
-		if (amdgpu_ttm_tt_is_userptr(bo->tbo.ttm) &&
-		    lobj->user_invalidated && lobj->user_pages) {
-			amdgpu_bo_placement_from_domain(bo,
-							AMDGPU_GEM_DOMAIN_CPU);
-			r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
-			if (r)
-				return r;
-
-			amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm,
-						     lobj->user_pages);
-		}
-
-		r = amdgpu_cs_bo_validate(p, bo);
-		if (r)
-			return r;
-
-		kvfree(lobj->user_pages);
-		lobj->user_pages = NULL;
-	}
-	return 0;
-}
-
 static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				union drm_amdgpu_cs *cs)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	struct ttm_operation_ctx ctx = { true, false };
 	struct amdgpu_vm *vm = &fpriv->vm;
 	struct amdgpu_bo_list_entry *e;
-	struct list_head duplicates;
+	struct drm_gem_object *obj;
 	struct amdgpu_bo *gds;
 	struct amdgpu_bo *gws;
 	struct amdgpu_bo *oa;
+	unsigned long index;
 	int r;
 
-	INIT_LIST_HEAD(&p->validated);
-
 	/* p->bo_list could already be assigned if AMDGPU_CHUNK_ID_BO_HANDLES is present */
 	if (cs->in.bo_list_handle) {
 		if (p->bo_list)
@@ -517,25 +476,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 			return r;
 	}
 
-	/* One for TTM and one for the CS job */
-	amdgpu_bo_list_for_each_entry(e, p->bo_list)
-		e->tv.num_shared = 2;
-
-	amdgpu_bo_list_get_list(p->bo_list, &p->validated);
-
-	INIT_LIST_HEAD(&duplicates);
-	amdgpu_vm_get_pd_bo(&fpriv->vm, &p->validated, &p->vm_pd);
-
-	if (p->uf_entry.tv.bo && !ttm_to_amdgpu_bo(p->uf_entry.tv.bo)->parent)
-		list_add(&p->uf_entry.tv.head, &p->validated);
-
 	/* Get userptr backing pages. If pages are updated after registered
 	 * in amdgpu_gem_userptr_ioctl(), amdgpu_cs_list_validate() will do
 	 * amdgpu_ttm_backend_bind() to flush and invalidate new pages
 	 */
 	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 		bool userpage_invalidated = false;
+		struct amdgpu_bo *bo = e->bo;
 		int i;
 
 		e->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
@@ -562,18 +509,53 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		e->user_invalidated = userpage_invalidated;
 	}
 
-	r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
-				   &duplicates);
-	if (unlikely(r != 0)) {
-		if (r != -ERESTARTSYS)
-			DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
-		goto out;
+	drm_exec_while_not_all_locked(&p->exec) {
+		r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
+		drm_exec_continue_on_contention(&p->exec);
+		if (unlikely(r))
+			return r;
+
+		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);
+			drm_exec_break_on_contention(&p->exec);
+			if (unlikely(r))
+				return r;
+
+			e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
+		}
+		drm_exec_continue_on_contention(&p->exec);
+
+		if (p->uf_bo) {
+			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
+						 2);
+			drm_exec_continue_on_contention(&p->exec);
+			if (unlikely(r))
+				return r;
+		}
 	}
 
-	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
+		struct mm_struct *usermm;
+
+		usermm = amdgpu_ttm_tt_get_usermm(e->bo->tbo.ttm);
+		if (usermm && usermm != current->mm)
+			return -EPERM;
+
+		if (amdgpu_ttm_tt_is_userptr(e->bo->tbo.ttm) &&
+		    e->user_invalidated && e->user_pages) {
+			amdgpu_bo_placement_from_domain(e->bo,
+							AMDGPU_GEM_DOMAIN_CPU);
+			r = ttm_bo_validate(&e->bo->tbo, &e->bo->placement,
+					    &ctx);
+			if (r)
+				return r;
+
+			amdgpu_ttm_tt_set_user_pages(e->bo->tbo.ttm,
+						     e->user_pages);
+		}
 
-		e->bo_va = amdgpu_vm_bo_find(vm, bo);
+		kvfree(e->user_pages);
+		e->user_pages = NULL;
 	}
 
 	/* Move fence waiting after getting reservation lock of
@@ -583,7 +565,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	if (unlikely(r != 0)) {
 		if (r != -ERESTARTSYS)
 			DRM_ERROR("amdgpu_ctx_wait_prev_fence failed.\n");
-		goto error_validate;
+		return r;
 	}
 
 	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
@@ -595,16 +577,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				      amdgpu_cs_bo_validate, p);
 	if (r) {
 		DRM_ERROR("amdgpu_vm_validate_pt_bos() failed.\n");
-		goto error_validate;
+		return r;
 	}
 
-	r = amdgpu_cs_list_validate(p, &duplicates);
-	if (r)
-		goto error_validate;
+	drm_exec_for_each_duplicate_object(&p->exec, index, obj) {
+		r = amdgpu_cs_bo_validate(p, gem_to_amdgpu_bo(obj));
+		if (unlikely(r))
+			return r;
+	}
 
-	r = amdgpu_cs_list_validate(p, &p->validated);
-	if (r)
-		goto error_validate;
+	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+		r = amdgpu_cs_bo_validate(p, gem_to_amdgpu_bo(obj));
+		if (unlikely(r))
+			return r;
+	}
 
 	amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
 				     p->bytes_moved_vis);
@@ -626,28 +612,25 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
 	}
 
-	if (!r && p->uf_entry.tv.bo) {
-		struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo);
+	if (p->uf_bo) {
+		r = amdgpu_ttm_alloc_gart(&p->uf_bo->tbo);
+		if (unlikely(r))
+			return r;
 
-		r = amdgpu_ttm_alloc_gart(&uf->tbo);
-		p->job->uf_addr += amdgpu_bo_gpu_offset(uf);
+		p->job->uf_addr += amdgpu_bo_gpu_offset(p->uf_bo);
 	}
-
-error_validate:
-	if (r)
-		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
-out:
-	return r;
+	return 0;
 }
 
 static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
-	struct amdgpu_bo_list_entry *e;
+	struct drm_gem_object *obj;
+	unsigned long index;
 	int r;
 
-	list_for_each_entry(e, &p->validated, tv.head) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 		struct dma_resv *resv = bo->tbo.base.resv;
 		enum amdgpu_sync_mode sync_mode;
 
@@ -664,20 +647,14 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 /**
  * amdgpu_cs_parser_fini() - clean parser states
  * @parser:	parser structure holding parsing context.
- * @error:	error number
- * @backoff:	indicator to backoff the reservation
  *
- * If error is set then unvalidate buffer, otherwise just free memory
- * used by parsing context.
+ * Just free memory used by parsing context.
  **/
-static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
-				  bool backoff)
+static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser)
 {
 	unsigned i;
 
-	if (error && backoff)
-		ttm_eu_backoff_reservation(&parser->ticket,
-					   &parser->validated);
+	drm_exec_fini(&parser->exec);
 
 	for (i = 0; i < parser->num_post_deps; i++) {
 		drm_syncobj_put(parser->post_deps[i].syncobj);
@@ -698,11 +675,7 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
 	kvfree(parser->chunks);
 	if (parser->job)
 		amdgpu_job_free(parser->job);
-	if (parser->uf_entry.tv.bo) {
-		struct amdgpu_bo *uf = ttm_to_amdgpu_bo(parser->uf_entry.tv.bo);
-
-		amdgpu_bo_unref(&uf);
-	}
+	amdgpu_bo_unref(&parser->uf_bo);
 }
 
 static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
@@ -806,11 +779,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	}
 
 	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		/* ignore duplicates */
-		bo = ttm_to_amdgpu_bo(e->tv.bo);
-		if (!bo)
-			continue;
-
+		bo = e->bo;
 		bo_va = e->bo_va;
 		if (bo_va == NULL)
 			continue;
@@ -840,15 +809,8 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 
 	if (amdgpu_vm_debug) {
 		/* Invalidate all BOs to test for userspace bugs */
-		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-			struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-			/* ignore duplicates */
-			if (!bo)
-				continue;
-
-			amdgpu_vm_bo_invalidate(adev, bo, false);
-		}
+		amdgpu_bo_list_for_each_entry(e, p->bo_list)
+			amdgpu_vm_bo_invalidate(adev, e->bo, false);
 	}
 
 	return amdgpu_cs_sync_rings(p);
@@ -1197,7 +1159,9 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
 	struct drm_sched_entity *entity = p->entity;
 	struct amdgpu_bo_list_entry *e;
+	struct drm_gem_object *gobj;
 	struct amdgpu_job *job;
+	unsigned long index;
 	uint64_t seq;
 	int r;
 
@@ -1219,11 +1183,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
 	 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
 	 */
-	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-		r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
-	}
+	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list)
+		r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm);
 	if (r) {
 		r = -EAGAIN;
 		goto error_abort;
@@ -1246,16 +1207,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	amdgpu_job_free_resources(job);
 
 	trace_amdgpu_cs_ioctl(job);
-	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
+	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->exec.ticket);
 	drm_sched_entity_push_job(&job->base);
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
-	/* Make sure all BOs are remembered as writers */
-	amdgpu_bo_list_for_each_entry(e, p->bo_list)
-		e->tv.num_shared = 0;
+	drm_exec_for_each_duplicate_object(&p->exec, index, gobj) {
+		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
+		dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
+	}
+	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
+		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
+		dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
+	}
 
-	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
 	mutex_unlock(&p->adev->notifier_lock);
 
 	return 0;
@@ -1285,7 +1250,6 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	struct amdgpu_device *adev = drm_to_adev(dev);
 	union drm_amdgpu_cs *cs = data;
 	struct amdgpu_cs_parser parser = {};
-	bool reserved_buffers = false;
 	int r;
 
 	if (amdgpu_ras_intr_triggered())
@@ -1323,8 +1287,6 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		goto out;
 	}
 
-	reserved_buffers = true;
-
 	trace_amdgpu_cs_ibs(&parser);
 
 	r = amdgpu_cs_vm_handling(&parser);
@@ -1333,7 +1295,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 
 	r = amdgpu_cs_submit(&parser, cs);
 out:
-	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
+	amdgpu_cs_parser_fini(&parser);
 
 	return r;
 }
@@ -1665,7 +1627,7 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
 	*map = mapping;
 
 	/* Double check that the BO is reserved by this CS */
-	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->ticket)
+	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket)
 		return -EINVAL;
 
 	if (!((*bo)->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
index 30ecc4917f81..f447a9c533b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
@@ -23,6 +23,8 @@
 #ifndef __AMDGPU_CS_H__
 #define __AMDGPU_CS_H__
 
+#include <drm/drm_exec.h>
+
 #include "amdgpu_job.h"
 #include "amdgpu_bo_list.h"
 #include "amdgpu_ring.h"
@@ -55,11 +57,9 @@ struct amdgpu_cs_parser {
 	struct drm_sched_entity	*entity;
 
 	/* buffer objects */
-	struct ww_acquire_ctx		ticket;
+	struct drm_exec			exec;
 	struct amdgpu_bo_list		*bo_list;
 	struct amdgpu_mn		*mn;
-	struct amdgpu_bo_list_entry	vm_pd;
-	struct list_head		validated;
 	struct dma_fence		*fence;
 	uint64_t			bytes_moved_threshold;
 	uint64_t			bytes_moved_vis_threshold;
@@ -67,7 +67,7 @@ struct amdgpu_cs_parser {
 	uint64_t			bytes_moved_vis;
 
 	/* user fence */
-	struct amdgpu_bo_list_entry	uf_entry;
+	struct amdgpu_bo		*uf_bo;
 
 	unsigned			num_post_deps;
 	struct amdgpu_cs_post_dep	*post_deps;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c82c580f1df5..7e5cc8323329 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -618,28 +618,6 @@ static void amdgpu_vm_pt_next_dfs(struct amdgpu_device *adev,
 	     amdgpu_vm_pt_continue_dfs((start), (entry));			\
 	     (entry) = (cursor).entry, amdgpu_vm_pt_next_dfs((adev), &(cursor)))
 
-/**
- * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
- *
- * @vm: vm providing the BOs
- * @validated: head of validation list
- * @entry: entry to add
- *
- * Add the page directory to the list of BOs to
- * validate for command submission.
- */
-void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
-			 struct list_head *validated,
-			 struct amdgpu_bo_list_entry *entry)
-{
-	entry->priority = 0;
-	entry->tv.bo = &vm->root.bo->tbo;
-	/* Two for VM updates, one for TTM and one for the CS job */
-	entry->tv.num_shared = 4;
-	entry->user_pages = NULL;
-	list_add(&entry->tv.head, validated);
-}
-
 /**
  * amdgpu_vm_lock_pd - lock PD in drm_exec
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 15d26f442e70..75c8f10b5a39 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -382,9 +382,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
-void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
-			 struct list_head *validated,
-			 struct amdgpu_bo_list_entry *entry);
 int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec);
 bool amdgpu_vm_ready(struct amdgpu_vm *vm);
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/8] drm/radeon: switch over to drm_exec
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
                   ` (4 preceding siblings ...)
  2022-05-04  7:47 ` [PATCH 5/8] drm/amdgpu: use the new drm_exec object for CS Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-04  7:47 ` [PATCH 7/8] drm/qxl: switch to using drm_exec Christian König
  2022-05-04  7:47 ` [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx Christian König
  7 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Just a straightforward conversion without any optimization.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  7 ++--
 drivers/gpu/drm/radeon/radeon_cs.c     | 45 +++++++++++++-------------
 drivers/gpu/drm/radeon/radeon_gem.c    | 40 +++++++++++++----------
 drivers/gpu/drm/radeon/radeon_object.c | 25 +++++++-------
 drivers/gpu/drm/radeon/radeon_object.h |  2 +-
 drivers/gpu/drm/radeon/radeon_vm.c     | 10 +++---
 6 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 08f83bf2c330..841b463a0bea 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -76,8 +76,8 @@
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 
+#include <drm/drm_exec.h>
 #include <drm/drm_gem.h>
 
 #include "radeon_family.h"
@@ -458,7 +458,8 @@ struct radeon_mman {
 
 struct radeon_bo_list {
 	struct radeon_bo		*robj;
-	struct ttm_validate_buffer	tv;
+	struct list_head		list;
+	bool				shared;
 	uint64_t			gpu_offset;
 	unsigned			preferred_domains;
 	unsigned			allowed_domains;
@@ -1069,6 +1070,7 @@ struct radeon_cs_parser {
 	struct radeon_bo_list	*vm_bos;
 	struct list_head	validated;
 	unsigned		dma_reloc_idx;
+	struct drm_exec		exec;
 	/* indices of various chunks */
 	struct radeon_cs_chunk  *chunk_ib;
 	struct radeon_cs_chunk  *chunk_relocs;
@@ -1082,7 +1084,6 @@ struct radeon_cs_parser {
 	u32			cs_flags;
 	u32			ring;
 	s32			priority;
-	struct ww_acquire_ctx	ticket;
 };
 
 static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 446f7bae54c4..50613a2f0c90 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -182,11 +182,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 			}
 		}
 
-		p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
-		p->relocs[i].tv.num_shared = !r->write_domain;
-
-		radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
-				      priority);
+		p->relocs[i].shared = !r->write_domain;
+		radeon_cs_buckets_add(&buckets, &p->relocs[i].list, priority);
 	}
 
 	radeon_cs_buckets_get_list(&buckets, &p->validated);
@@ -197,7 +194,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 	if (need_mmap_lock)
 		mmap_read_lock(current->mm);
 
-	r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
+	r = radeon_bo_list_validate(p->rdev, &p->exec, &p->validated, p->ring);
 
 	if (need_mmap_lock)
 		mmap_read_unlock(current->mm);
@@ -253,12 +250,11 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 	struct radeon_bo_list *reloc;
 	int r;
 
-	list_for_each_entry(reloc, &p->validated, tv.head) {
+	list_for_each_entry(reloc, &p->validated, list) {
 		struct dma_resv *resv;
 
 		resv = reloc->robj->tbo.base.resv;
-		r = radeon_sync_resv(p->rdev, &p->ib.sync, resv,
-				     reloc->tv.num_shared);
+		r = radeon_sync_resv(p->rdev, &p->ib.sync, resv, reloc->shared);
 		if (r)
 			return r;
 	}
@@ -275,6 +271,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	s32 priority = 0;
 
 	INIT_LIST_HEAD(&p->validated);
+	drm_exec_init(&p->exec, true);
 
 	if (!cs->num_chunks) {
 		return 0;
@@ -396,8 +393,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 static int cmp_size_smaller_first(void *priv, const struct list_head *a,
 				  const struct list_head *b)
 {
-	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head);
-	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head);
+	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, list);
+	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, list);
 
 	/* Sort A before B if A is smaller. */
 	return (int)la->robj->tbo.resource->num_pages -
@@ -413,11 +410,13 @@ static int cmp_size_smaller_first(void *priv, const struct list_head *a,
  * If error is set than unvalidate buffer, otherwise just free memory
  * used by parsing context.
  **/
-static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bool backoff)
+static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 {
 	unsigned i;
 
 	if (!error) {
+		struct radeon_bo_list *reloc;
+
 		/* Sort the buffer list from the smallest to largest buffer,
 		 * which affects the order of buffers in the LRU list.
 		 * This assures that the smallest buffers are added first
@@ -429,15 +428,17 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 		 * per frame under memory pressure.
 		 */
 		list_sort(NULL, &parser->validated, cmp_size_smaller_first);
-
-		ttm_eu_fence_buffer_objects(&parser->ticket,
-					    &parser->validated,
-					    &parser->ib.fence->base);
-	} else if (backoff) {
-		ttm_eu_backoff_reservation(&parser->ticket,
-					   &parser->validated);
+		list_for_each_entry(reloc, &parser->validated, list) {
+			dma_resv_add_fence(reloc->robj->tbo.base.resv,
+					   &parser->ib.fence->base,
+					   reloc->shared ?
+					   DMA_RESV_USAGE_READ :
+					   DMA_RESV_USAGE_WRITE);
+		}
 	}
 
+	drm_exec_fini(&parser->exec);
+
 	if (parser->relocs != NULL) {
 		for (i = 0; i < parser->nrelocs; i++) {
 			struct radeon_bo *bo = parser->relocs[i].robj;
@@ -689,7 +690,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	r = radeon_cs_parser_init(&parser, data);
 	if (r) {
 		DRM_ERROR("Failed to initialize parser !\n");
-		radeon_cs_parser_fini(&parser, r, false);
+		radeon_cs_parser_fini(&parser, r);
 		up_read(&rdev->exclusive_lock);
 		r = radeon_cs_handle_lockup(rdev, r);
 		return r;
@@ -703,7 +704,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	}
 
 	if (r) {
-		radeon_cs_parser_fini(&parser, r, false);
+		radeon_cs_parser_fini(&parser, r);
 		up_read(&rdev->exclusive_lock);
 		r = radeon_cs_handle_lockup(rdev, r);
 		return r;
@@ -720,7 +721,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		goto out;
 	}
 out:
-	radeon_cs_parser_fini(&parser, r, true);
+	radeon_cs_parser_fini(&parser, r);
 	up_read(&rdev->exclusive_lock);
 	r = radeon_cs_handle_lockup(rdev, r);
 	return r;
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 8c01a7f0e027..cd4540e1a306 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -625,33 +625,41 @@ int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
 static void radeon_gem_va_update_vm(struct radeon_device *rdev,
 				    struct radeon_bo_va *bo_va)
 {
-	struct ttm_validate_buffer tv, *entry;
-	struct radeon_bo_list *vm_bos;
-	struct ww_acquire_ctx ticket;
+	struct radeon_bo_list *vm_bos, *entry;
 	struct list_head list;
+	struct drm_exec exec;
 	unsigned domain;
 	int r;
 
 	INIT_LIST_HEAD(&list);
 
-	tv.bo = &bo_va->bo->tbo;
-	tv.num_shared = 1;
-	list_add(&tv.head, &list);
-
 	vm_bos = radeon_vm_get_bos(rdev, bo_va->vm, &list);
 	if (!vm_bos)
 		return;
 
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
-	if (r)
-		goto error_free;
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		list_for_each_entry(entry, &list, list) {
+			r = drm_exec_prepare_obj(&exec, &entry->robj->tbo.base,
+						 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(r))
+				goto error_cleanup;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	list_for_each_entry(entry, &list, head) {
-		domain = radeon_mem_type_to_domain(entry->bo->resource->mem_type);
+		r = drm_exec_prepare_obj(&exec, &bo_va->bo->tbo.base, 1);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r))
+			goto error_cleanup;
+	}
+
+	list_for_each_entry(entry, &list, list) {
+		domain = radeon_mem_type_to_domain(entry->robj->tbo.resource->mem_type);
 		/* if anything is swapped out don't swap it in here,
 		   just abort and wait for the next CS */
 		if (domain == RADEON_GEM_DOMAIN_CPU)
-			goto error_unreserve;
+			goto error_cleanup;
 	}
 
 	mutex_lock(&bo_va->vm->mutex);
@@ -665,10 +673,8 @@ static void radeon_gem_va_update_vm(struct radeon_device *rdev,
 error_unlock:
 	mutex_unlock(&bo_va->vm->mutex);
 
-error_unreserve:
-	ttm_eu_backoff_reservation(&ticket, &list);
-
-error_free:
+error_cleanup:
+	drm_exec_fini(&exec);
 	kvfree(vm_bos);
 
 	if (r && r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 6c4a6802ca96..7138d1a588d2 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -468,23 +468,26 @@ static u64 radeon_bo_get_threshold_for_moves(struct radeon_device *rdev)
 }
 
 int radeon_bo_list_validate(struct radeon_device *rdev,
-			    struct ww_acquire_ctx *ticket,
+			    struct drm_exec *exec,
 			    struct list_head *head, int ring)
 {
 	struct ttm_operation_ctx ctx = { true, false };
 	struct radeon_bo_list *lobj;
-	struct list_head duplicates;
-	int r;
 	u64 bytes_moved = 0, initial_bytes_moved;
 	u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
+	int r;
 
-	INIT_LIST_HEAD(&duplicates);
-	r = ttm_eu_reserve_buffers(ticket, head, true, &duplicates);
-	if (unlikely(r != 0)) {
-		return r;
+	drm_exec_while_not_all_locked(exec) {
+		list_for_each_entry(lobj, head, list) {
+			r = drm_exec_prepare_obj(exec, &lobj->robj->tbo.base,
+						 1);
+			drm_exec_break_on_contention(exec);
+			if (unlikely(r))
+				return r;
+		}
 	}
 
-	list_for_each_entry(lobj, head, tv.head) {
+	list_for_each_entry(lobj, head, list) {
 		struct radeon_bo *bo = lobj->robj;
 		if (!bo->tbo.pin_count) {
 			u32 domain = lobj->preferred_domains;
@@ -523,7 +526,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 					domain = lobj->allowed_domains;
 					goto retry;
 				}
-				ttm_eu_backoff_reservation(ticket, head);
 				return r;
 			}
 		}
@@ -531,11 +533,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 		lobj->tiling_flags = bo->tiling_flags;
 	}
 
-	list_for_each_entry(lobj, &duplicates, tv.head) {
-		lobj->gpu_offset = radeon_bo_gpu_offset(lobj->robj);
-		lobj->tiling_flags = lobj->robj->tiling_flags;
-	}
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 0a6ef49e990a..04c7c17e8287 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -152,7 +152,7 @@ extern void radeon_bo_force_delete(struct radeon_device *rdev);
 extern int radeon_bo_init(struct radeon_device *rdev);
 extern void radeon_bo_fini(struct radeon_device *rdev);
 extern int radeon_bo_list_validate(struct radeon_device *rdev,
-				   struct ww_acquire_ctx *ticket,
+				   struct drm_exec *exec,
 				   struct list_head *head, int ring);
 extern int radeon_bo_set_tiling_flags(struct radeon_bo *bo,
 				u32 tiling_flags, u32 pitch);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 987cabbf1318..647c4a07d92a 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -142,10 +142,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 	list[0].robj = vm->page_directory;
 	list[0].preferred_domains = RADEON_GEM_DOMAIN_VRAM;
 	list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
-	list[0].tv.bo = &vm->page_directory->tbo;
-	list[0].tv.num_shared = 1;
+	list[0].shared = true;
 	list[0].tiling_flags = 0;
-	list_add(&list[0].tv.head, head);
+	list_add(&list[0].list, head);
 
 	for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
 		if (!vm->page_tables[i].bo)
@@ -154,10 +153,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 		list[idx].robj = vm->page_tables[i].bo;
 		list[idx].preferred_domains = RADEON_GEM_DOMAIN_VRAM;
 		list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
-		list[idx].tv.bo = &list[idx].robj->tbo;
-		list[idx].tv.num_shared = 1;
+		list[idx].shared = true;
 		list[idx].tiling_flags = 0;
-		list_add(&list[idx++].tv.head, head);
+		list_add(&list[idx++].list, head);
 	}
 
 	return list;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/8] drm/qxl: switch to using drm_exec
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
                   ` (5 preceding siblings ...)
  2022-05-04  7:47 ` [PATCH 6/8] drm/radeon: switch over to drm_exec Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-04  7:47 ` [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx Christian König
  7 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/qxl/qxl_drv.h     |  7 ++--
 drivers/gpu/drm/qxl/qxl_release.c | 67 ++++++++++++++++---------------
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 47c169673088..7bcf890e5a9b 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -38,6 +38,7 @@
 
 #include <drm/drm_crtc.h>
 #include <drm/drm_encoder.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_fb_helper.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/drm_ioctl.h>
@@ -45,7 +46,6 @@
 #include <drm/qxl_drm.h>
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/ttm/ttm_placement.h>
 
 #include "qxl_dev.h"
@@ -103,7 +103,8 @@ struct qxl_gem {
 };
 
 struct qxl_bo_list {
-	struct ttm_validate_buffer tv;
+	struct qxl_bo		*bo;
+	struct list_head	list;
 };
 
 struct qxl_crtc {
@@ -153,7 +154,7 @@ struct qxl_release {
 	struct qxl_bo *release_bo;
 	uint32_t release_offset;
 	uint32_t surface_release_id;
-	struct ww_acquire_ctx ticket;
+	struct drm_exec	exec;
 	struct list_head bos;
 };
 
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 368d26da0d6a..da7cd9cd58f9 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -121,13 +121,11 @@ qxl_release_free_list(struct qxl_release *release)
 {
 	while (!list_empty(&release->bos)) {
 		struct qxl_bo_list *entry;
-		struct qxl_bo *bo;
 
 		entry = container_of(release->bos.next,
-				     struct qxl_bo_list, tv.head);
-		bo = to_qxl_bo(entry->tv.bo);
-		qxl_bo_unref(&bo);
-		list_del(&entry->tv.head);
+				     struct qxl_bo_list, list);
+		qxl_bo_unref(&entry->bo);
+		list_del(&entry->list);
 		kfree(entry);
 	}
 	release->release_bo = NULL;
@@ -172,8 +170,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 {
 	struct qxl_bo_list *entry;
 
-	list_for_each_entry(entry, &release->bos, tv.head) {
-		if (entry->tv.bo == &bo->tbo)
+	list_for_each_entry(entry, &release->bos, list) {
+		if (entry->bo == bo)
 			return 0;
 	}
 
@@ -182,9 +180,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 		return -ENOMEM;
 
 	qxl_bo_ref(bo);
-	entry->tv.bo = &bo->tbo;
-	entry->tv.num_shared = 0;
-	list_add_tail(&entry->tv.head, &release->bos);
+	entry->bo = bo;
+	list_add_tail(&entry->list, &release->bos);
 	return 0;
 }
 
@@ -221,21 +218,27 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
 	if (list_is_singular(&release->bos))
 		return 0;
 
-	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
-				     !no_intr, NULL);
-	if (ret)
-		return ret;
-
-	list_for_each_entry(entry, &release->bos, tv.head) {
-		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
-
-		ret = qxl_release_validate_bo(bo);
-		if (ret) {
-			ttm_eu_backoff_reservation(&release->ticket, &release->bos);
-			return ret;
+	drm_exec_init(&release->exec, !no_intr);
+	drm_exec_while_not_all_locked(&release->exec) {
+		list_for_each_entry(entry, &release->bos, list) {
+			ret = drm_exec_prepare_obj(&release->exec,
+						   &entry->bo->tbo.base,
+						   1);
+			drm_exec_break_on_contention(&release->exec);
+			if (ret)
+				goto error;
 		}
 	}
+
+	list_for_each_entry(entry, &release->bos, list) {
+		ret = qxl_release_validate_bo(entry->bo);
+		if (ret)
+			goto error;
+	}
 	return 0;
+error:
+	drm_exec_fini(&release->exec);
+	return ret;
 }
 
 void qxl_release_backoff_reserve_list(struct qxl_release *release)
@@ -245,7 +248,7 @@ void qxl_release_backoff_reserve_list(struct qxl_release *release)
 	if (list_is_singular(&release->bos))
 		return;
 
-	ttm_eu_backoff_reservation(&release->ticket, &release->bos);
+	drm_exec_fini(&release->exec);
 }
 
 int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,
@@ -404,18 +407,18 @@ void qxl_release_unmap(struct qxl_device *qdev,
 
 void qxl_release_fence_buffer_objects(struct qxl_release *release)
 {
-	struct ttm_buffer_object *bo;
 	struct ttm_device *bdev;
-	struct ttm_validate_buffer *entry;
+	struct qxl_bo_list *entry;
 	struct qxl_device *qdev;
+	struct qxl_bo *bo;
 
 	/* if only one object on the release its the release itself
 	   since these objects are pinned no need to reserve */
 	if (list_is_singular(&release->bos) || list_empty(&release->bos))
 		return;
 
-	bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo;
-	bdev = bo->bdev;
+	bo = list_first_entry(&release->bos, struct qxl_bo_list, list)->bo;
+	bdev = bo->tbo.bdev;
 	qdev = container_of(bdev, struct qxl_device, mman.bdev);
 
 	/*
@@ -426,14 +429,12 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 		       release->id | 0xf0000000, release->base.seqno);
 	trace_dma_fence_emit(&release->base);
 
-	list_for_each_entry(entry, &release->bos, head) {
+	list_for_each_entry(entry, &release->bos, list) {
 		bo = entry->bo;
 
-		dma_resv_add_fence(bo->base.resv, &release->base,
+		dma_resv_add_fence(bo->tbo.base.resv, &release->base,
 				   DMA_RESV_USAGE_READ);
-		ttm_bo_move_to_lru_tail_unlocked(bo);
-		dma_resv_unlock(bo->base.resv);
+		ttm_bo_move_to_lru_tail_unlocked(&bo->tbo);
 	}
-	ww_acquire_fini(&release->ticket);
+	drm_exec_fini(&release->exec);
 }
-
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx
  2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
                   ` (6 preceding siblings ...)
  2022-05-04  7:47 ` [PATCH 7/8] drm/qxl: switch to using drm_exec Christian König
@ 2022-05-04  7:47 ` Christian König
  2022-05-09 14:33     ` Daniel Vetter
  7 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2022-05-04  7:47 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

VMWGFX is the only remaining user of this and should probably moved over
to drm_exec when it starts using GEM as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/Makefile                                  | 4 ++--
 drivers/gpu/drm/vmwgfx/Makefile                               | 2 +-
 drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c            | 3 ++-
 .../drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h     | 2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h                           | 2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_validation.h                    | 2 +-
 6 files changed, 8 insertions(+), 7 deletions(-)
 rename drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c (99%)
 rename {include/drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h (99%)

diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
index f906b22959cf..b05a8477d0d0 100644
--- a/drivers/gpu/drm/ttm/Makefile
+++ b/drivers/gpu/drm/ttm/Makefile
@@ -3,8 +3,8 @@
 # Makefile for the drm device driver.  This driver provides support for the
 
 ttm-y := ttm_tt.o ttm_bo.o ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
-	ttm_execbuf_util.o ttm_range_manager.o ttm_resource.o ttm_pool.o \
-	ttm_device.o ttm_sys_manager.o
+	ttm_range_manager.o ttm_resource.o ttm_pool.o ttm_device.o \
+	ttm_sys_manager.o
 ttm-$(CONFIG_AGP) += ttm_agp_backend.o
 
 obj-$(CONFIG_DRM_TTM) += ttm.o
diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
index eee73b9aa404..c2c836103b23 100644
--- a/drivers/gpu/drm/vmwgfx/Makefile
+++ b/drivers/gpu/drm/vmwgfx/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_hashtab.o vmwgfx_kms.o vmwgfx_drv.o \
-	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o \
+	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o ttm_execbuf_util.o \
 	    vmwgfx_cmd.o vmwgfx_irq.o vmwgfx_ldu.o vmwgfx_ttm_glue.o \
 	    vmwgfx_overlay.o vmwgfx_gmrid_manager.o vmwgfx_fence.o \
 	    vmwgfx_bo.o vmwgfx_scrn.o vmwgfx_context.o \
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
similarity index 99%
rename from drivers/gpu/drm/ttm/ttm_execbuf_util.c
rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
index dbee34a058df..1030f263ba07 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
@@ -26,13 +26,14 @@
  *
  **************************************************************************/
 
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/module.h>
 
+#include "ttm_execbuf_util.h"
+
 static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
 					      struct ttm_validate_buffer *entry)
 {
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
similarity index 99%
rename from include/drm/ttm/ttm_execbuf_util.h
rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
index a99d7fdf2964..47553bf31c73 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
@@ -33,7 +33,7 @@
 
 #include <linux/list.h>
 
-#include "ttm_bo_api.h"
+#include <drm/ttm/ttm_bo_api.h>
 
 /**
  * struct ttm_validate_buffer
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index be19aa6e1f13..cae306c60af9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -37,8 +37,8 @@
 #include <drm/drm_rect.h>
 
 #include <drm/ttm/ttm_bo_driver.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 
+#include "ttm_execbuf_util.h"
 #include "ttm_object.h"
 
 #include "vmwgfx_fence.h"
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
index f21df053882b..3613a3d52528 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
@@ -31,7 +31,7 @@
 #include <linux/list.h>
 #include <linux/ww_mutex.h>
 
-#include <drm/ttm/ttm_execbuf_util.h>
+#include "ttm_execbuf_util.h"
 
 #include "vmwgfx_hashtab.h"
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
@ 2022-05-09 14:31     ` Daniel Vetter
  2022-06-09  0:05   ` Felix Kuehling
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-09 14:31 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Christian König

On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existinc TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
> 
> v2: drop xarray and use dynamic resized array instead, the locking
>     overhead is unecessary and measureable.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  Documentation/gpu/drm-mm.rst |  12 ++
>  drivers/gpu/drm/Kconfig      |   7 +
>  drivers/gpu/drm/Makefile     |   2 +
>  drivers/gpu/drm/drm_exec.c   | 295 +++++++++++++++++++++++++++++++++++
>  include/drm/drm_exec.h       | 144 +++++++++++++++++
>  5 files changed, 460 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_exec.c
>  create mode 100644 include/drm/drm_exec.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index f32ccce5722d..bf7dd2a78e9b 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>     :export:
>  
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>  GPU Scheduler
>  =============
>  
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index e88c497fa010..1b35c10df263 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -179,6 +179,12 @@ config DRM_TTM
>  	  GPU memory types. Will be enabled automatically if a device driver
>  	  uses it.
>  
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>  config DRM_BUDDY
>  	tristate
>  	depends on DRM
> @@ -252,6 +258,7 @@ config DRM_AMDGPU
>  	select DRM_SCHED
>  	select DRM_TTM
>  	select DRM_TTM_HELPER
> +	select DRM_EXEC
>  	select POWER_SUPPLY
>  	select HWMON
>  	select BACKLIGHT_CLASS_DEVICE
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 15fe3163f822..ee8573b683f3 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -37,6 +37,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>  #
>  # Memory-management helpers
>  #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>  
>  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>  
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..ed2106c22786
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,295 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_lock(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Initialize the drm_exec_objects container */
> +static void drm_exec_objects_init(struct drm_exec_objects *container)
> +{
> +	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	container->max_objects = container->objects ?
> +		PAGE_SIZE / sizeof(void *) : 0;
> +	container->num_objects = 0;
> +}
> +
> +/* Cleanup the drm_exec_objects container */
> +static void drm_exec_objects_fini(struct drm_exec_objects *container)
> +{
> +	kvfree(container->objects);
> +}
> +
> +/* Make sure we have enough room and add an object the container */
> +static int drm_exec_objects_add(struct drm_exec_objects *container,
> +				struct drm_gem_object *obj)
> +{
> +	if (unlikely(container->num_objects == container->max_objects)) {
> +		size_t size = container->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		container->objects = tmp;
> +		container->max_objects += PAGE_SIZE / sizeof(void *);

Might be worth it to inquire the actual allocation size here, since if
it's kmalloc the generic buckets only cover doubling of sizes, so once
it's big it goes up a lot quicker than PAGE_SIZE.

But also krealloc checks this internally already so maybe better to not
break the abstraction.

> +	}
> +	drm_gem_object_get(obj);
> +	container->objects[container->num_objects++] = obj;
> +	return 0;
> +}
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_duplicate_object(exec, index, obj)
> +		drm_gem_object_put(obj);
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked and duplicate
> + * objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	drm_exec_objects_init(&exec->locked);
> +	drm_exec_objects_init(&exec->duplicates);
> +	exec->contended = DRM_EXEC_DUMMY;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finilize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	drm_exec_objects_fini(&exec->locked);
> +	drm_exec_objects_fini(&exec->duplicates);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);

Not sure why this is here instead of in _init()? I thought you're playing
some really dangerous tricks with re-initting the acquire ctx, which would
at least be questionable, but does not look like that.

> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->locked.num_objects = 0;
> +	exec->duplicates.num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the xa and reserve fences */
> +static int drm_exec_obj_locked(struct drm_exec_objects *container,
> +			       struct drm_gem_object *obj,
> +			       unsigned int num_fences)
> +{
> +	int ret;
> +
> +	if (container) {
> +		ret = drm_exec_objects_add(container, obj);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (num_fences) {
> +		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> +		if (ret)
> +			goto error_erase;
> +	}
> +
> +	return 0;
> +
> +error_erase:
> +	if (container) {
> +		--container->num_objects;
> +		drm_gem_object_put(obj);
> +	}
> +	return ret;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
> +	if (unlikely(ret))
> +		dma_resv_unlock(obj->resv);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * succesfully locked objects are put into the locked container. Duplicates
> + * detected as well and automatically moved into the duplicates container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY)) {
> +		struct drm_exec_objects *container = &exec->duplicates;
> +
> +		/*
> +		 * If this is the first locked GEM object it was most likely
> +		 * just contended. So don't add it to the duplicates, just
> +		 * reserve the fence slots.
> +		 */
> +		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
> +			container = NULL;
> +
> +		ret = drm_exec_obj_locked(container, obj, num_fences);
> +		if (ret)
> +			return ret;
> +
> +	} else if (unlikely(ret)) {
> +		return ret;
> +
> +	} else {
> +		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
> +		if (ret)
> +			goto error_unlock;
> +	}
> +
> +	drm_gem_object_get(obj);
> +	return 0;
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..f73981c6292e
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,144 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec_objects - Container for GEM objects in a drm_exec
> + */
> +struct drm_exec_objects {
> +	unsigned int		num_objects;
> +	unsigned int		max_objects;
> +	struct drm_gem_object	**objects;
> +};
> +
> +/**
> + * drm_exec_objects_for_each - iterate over all the objects inside the container
> + */
> +#define drm_exec_objects_for_each(array, index, obj)		\
> +	for (index = 0, obj = (array)->objects[0];		\
> +	     index < (array)->num_objects;			\
> +	     ++index, obj = (array)->objects[index])
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @locked: container for the locked GEM objects
> +	 */
> +	struct drm_exec_objects	locked;
> +
> +	/**
> +	 * @duplicates: container for the duplicated GEM objects
> +	 */
> +	struct drm_exec_objects	duplicates;
> +
> +	/**
> +	 * @contended: contended GEM object we backet of for.
> +	 */
> +	struct drm_gem_object	*contended;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->locked, index, obj)
> +
> +/**
> + * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the duplicate GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +/**
> + * drm_exec_has_duplicates - check for duplicated GEM object
> + * @exec: drm_exec object
> + *
> + * Return true if the drm_exec object has encountered some already locked GEM
> + * objects while trying to lock them. This can happen if multiple GEM objects
> + * share the same underlying resv object.
> + */
> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> +{
> +	return exec->duplicates.num_objects > 0;

Definitely an aside, but in our i915 efforts to get rid of temporary pins
we run into some fun where the eviction code couldn't differentiate from
memory we need reserved for the CS and memory we just keep locked because
we evicted it and fun stuff like that. So maybe we need a bit more
tracking here eventually, but that's only when we have this somehow glued
into ttm eviction code.

Also the even more massive step would be to glue this into dma-buf so you
can do dynamic dma-buf eviction and still keep track of all the buffers. I
think with some clever pointer tagging and a bit more indirection we could
nest drm_exec structures (so that a driver could insert it's entire
drm_exec structure with a drm_exec-level callback for handling refcounting
and stuff like that).

So anyway I think this all looks good, just one more thing before I think
we should land this:

gem helpers in drm_gem_lock_reservations() has something which is
practically compatible already, except that you bulk-add the entire set of
objects. I think if you add a bulk-prepare function then we could also
replace all those. Maybe even nicer if the bulk-prepare takes the array of
handles and does the handle lookup too, but at least something which can
subsititue drm_gem_lock_reservations with drm_exec would be nice to
validate the helpers a bit more and really make sure we only have one of
them left.

Thoughts?
-Daniel

> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +
> +#endif
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
@ 2022-05-09 14:31     ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-09 14:31 UTC (permalink / raw)
  To: Christian König; +Cc: daniel, amd-gfx, dri-devel, Christian König

On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existinc TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
> 
> v2: drop xarray and use dynamic resized array instead, the locking
>     overhead is unecessary and measureable.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  Documentation/gpu/drm-mm.rst |  12 ++
>  drivers/gpu/drm/Kconfig      |   7 +
>  drivers/gpu/drm/Makefile     |   2 +
>  drivers/gpu/drm/drm_exec.c   | 295 +++++++++++++++++++++++++++++++++++
>  include/drm/drm_exec.h       | 144 +++++++++++++++++
>  5 files changed, 460 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_exec.c
>  create mode 100644 include/drm/drm_exec.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index f32ccce5722d..bf7dd2a78e9b 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>     :export:
>  
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>  GPU Scheduler
>  =============
>  
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index e88c497fa010..1b35c10df263 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -179,6 +179,12 @@ config DRM_TTM
>  	  GPU memory types. Will be enabled automatically if a device driver
>  	  uses it.
>  
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>  config DRM_BUDDY
>  	tristate
>  	depends on DRM
> @@ -252,6 +258,7 @@ config DRM_AMDGPU
>  	select DRM_SCHED
>  	select DRM_TTM
>  	select DRM_TTM_HELPER
> +	select DRM_EXEC
>  	select POWER_SUPPLY
>  	select HWMON
>  	select BACKLIGHT_CLASS_DEVICE
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 15fe3163f822..ee8573b683f3 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -37,6 +37,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>  #
>  # Memory-management helpers
>  #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>  
>  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>  
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..ed2106c22786
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,295 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_lock(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Initialize the drm_exec_objects container */
> +static void drm_exec_objects_init(struct drm_exec_objects *container)
> +{
> +	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	container->max_objects = container->objects ?
> +		PAGE_SIZE / sizeof(void *) : 0;
> +	container->num_objects = 0;
> +}
> +
> +/* Cleanup the drm_exec_objects container */
> +static void drm_exec_objects_fini(struct drm_exec_objects *container)
> +{
> +	kvfree(container->objects);
> +}
> +
> +/* Make sure we have enough room and add an object the container */
> +static int drm_exec_objects_add(struct drm_exec_objects *container,
> +				struct drm_gem_object *obj)
> +{
> +	if (unlikely(container->num_objects == container->max_objects)) {
> +		size_t size = container->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		container->objects = tmp;
> +		container->max_objects += PAGE_SIZE / sizeof(void *);

Might be worth it to inquire the actual allocation size here, since if
it's kmalloc the generic buckets only cover doubling of sizes, so once
it's big it goes up a lot quicker than PAGE_SIZE.

But also krealloc checks this internally already so maybe better to not
break the abstraction.

> +	}
> +	drm_gem_object_get(obj);
> +	container->objects[container->num_objects++] = obj;
> +	return 0;
> +}
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_duplicate_object(exec, index, obj)
> +		drm_gem_object_put(obj);
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked and duplicate
> + * objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	drm_exec_objects_init(&exec->locked);
> +	drm_exec_objects_init(&exec->duplicates);
> +	exec->contended = DRM_EXEC_DUMMY;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finilize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	drm_exec_objects_fini(&exec->locked);
> +	drm_exec_objects_fini(&exec->duplicates);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);

Not sure why this is here instead of in _init()? I thought you're playing
some really dangerous tricks with re-initting the acquire ctx, which would
at least be questionable, but does not look like that.

> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->locked.num_objects = 0;
> +	exec->duplicates.num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the xa and reserve fences */
> +static int drm_exec_obj_locked(struct drm_exec_objects *container,
> +			       struct drm_gem_object *obj,
> +			       unsigned int num_fences)
> +{
> +	int ret;
> +
> +	if (container) {
> +		ret = drm_exec_objects_add(container, obj);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (num_fences) {
> +		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> +		if (ret)
> +			goto error_erase;
> +	}
> +
> +	return 0;
> +
> +error_erase:
> +	if (container) {
> +		--container->num_objects;
> +		drm_gem_object_put(obj);
> +	}
> +	return ret;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
> +	if (unlikely(ret))
> +		dma_resv_unlock(obj->resv);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * succesfully locked objects are put into the locked container. Duplicates
> + * detected as well and automatically moved into the duplicates container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY)) {
> +		struct drm_exec_objects *container = &exec->duplicates;
> +
> +		/*
> +		 * If this is the first locked GEM object it was most likely
> +		 * just contended. So don't add it to the duplicates, just
> +		 * reserve the fence slots.
> +		 */
> +		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
> +			container = NULL;
> +
> +		ret = drm_exec_obj_locked(container, obj, num_fences);
> +		if (ret)
> +			return ret;
> +
> +	} else if (unlikely(ret)) {
> +		return ret;
> +
> +	} else {
> +		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
> +		if (ret)
> +			goto error_unlock;
> +	}
> +
> +	drm_gem_object_get(obj);
> +	return 0;
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..f73981c6292e
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,144 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec_objects - Container for GEM objects in a drm_exec
> + */
> +struct drm_exec_objects {
> +	unsigned int		num_objects;
> +	unsigned int		max_objects;
> +	struct drm_gem_object	**objects;
> +};
> +
> +/**
> + * drm_exec_objects_for_each - iterate over all the objects inside the container
> + */
> +#define drm_exec_objects_for_each(array, index, obj)		\
> +	for (index = 0, obj = (array)->objects[0];		\
> +	     index < (array)->num_objects;			\
> +	     ++index, obj = (array)->objects[index])
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @locked: container for the locked GEM objects
> +	 */
> +	struct drm_exec_objects	locked;
> +
> +	/**
> +	 * @duplicates: container for the duplicated GEM objects
> +	 */
> +	struct drm_exec_objects	duplicates;
> +
> +	/**
> +	 * @contended: contended GEM object we backet of for.
> +	 */
> +	struct drm_gem_object	*contended;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->locked, index, obj)
> +
> +/**
> + * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the duplicate GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +/**
> + * drm_exec_has_duplicates - check for duplicated GEM object
> + * @exec: drm_exec object
> + *
> + * Return true if the drm_exec object has encountered some already locked GEM
> + * objects while trying to lock them. This can happen if multiple GEM objects
> + * share the same underlying resv object.
> + */
> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> +{
> +	return exec->duplicates.num_objects > 0;

Definitely an aside, but in our i915 efforts to get rid of temporary pins
we run into some fun where the eviction code couldn't differentiate from
memory we need reserved for the CS and memory we just keep locked because
we evicted it and fun stuff like that. So maybe we need a bit more
tracking here eventually, but that's only when we have this somehow glued
into ttm eviction code.

Also the even more massive step would be to glue this into dma-buf so you
can do dynamic dma-buf eviction and still keep track of all the buffers. I
think with some clever pointer tagging and a bit more indirection we could
nest drm_exec structures (so that a driver could insert it's entire
drm_exec structure with a drm_exec-level callback for handling refcounting
and stuff like that).

So anyway I think this all looks good, just one more thing before I think
we should land this:

gem helpers in drm_gem_lock_reservations() has something which is
practically compatible already, except that you bulk-add the entire set of
objects. I think if you add a bulk-prepare function then we could also
replace all those. Maybe even nicer if the bulk-prepare takes the array of
handles and does the handle lookup too, but at least something which can
subsititue drm_gem_lock_reservations with drm_exec would be nice to
validate the helpers a bit more and really make sure we only have one of
them left.

Thoughts?
-Daniel

> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +
> +#endif
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx
  2022-05-04  7:47 ` [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx Christian König
@ 2022-05-09 14:33     ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-09 14:33 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Christian König

On Wed, May 04, 2022 at 09:47:39AM +0200, Christian König wrote:
> VMWGFX is the only remaining user of this and should probably moved over
> to drm_exec when it starts using GEM as well.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

I guess this is a bit annoying since it means we can't require drm_exec in
ttm eviction, but we can make it an optional pointer in the ttm ctx. Needs
to be optional anyway since we won't roll this out to all drivers, and
then we can optionally use it to handle the locking in eviction instead of
the current lock dropping tricks.

I'm assuming at least that's your goal here, or is there a different one?
-Daniel

> ---
>  drivers/gpu/drm/ttm/Makefile                                  | 4 ++--
>  drivers/gpu/drm/vmwgfx/Makefile                               | 2 +-
>  drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c            | 3 ++-
>  .../drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h     | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h                           | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_validation.h                    | 2 +-
>  6 files changed, 8 insertions(+), 7 deletions(-)
>  rename drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c (99%)
>  rename {include/drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h (99%)
> 
> diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
> index f906b22959cf..b05a8477d0d0 100644
> --- a/drivers/gpu/drm/ttm/Makefile
> +++ b/drivers/gpu/drm/ttm/Makefile
> @@ -3,8 +3,8 @@
>  # Makefile for the drm device driver.  This driver provides support for the
>  
>  ttm-y := ttm_tt.o ttm_bo.o ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
> -	ttm_execbuf_util.o ttm_range_manager.o ttm_resource.o ttm_pool.o \
> -	ttm_device.o ttm_sys_manager.o
> +	ttm_range_manager.o ttm_resource.o ttm_pool.o ttm_device.o \
> +	ttm_sys_manager.o
>  ttm-$(CONFIG_AGP) += ttm_agp_backend.o
>  
>  obj-$(CONFIG_DRM_TTM) += ttm.o
> diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
> index eee73b9aa404..c2c836103b23 100644
> --- a/drivers/gpu/drm/vmwgfx/Makefile
> +++ b/drivers/gpu/drm/vmwgfx/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_hashtab.o vmwgfx_kms.o vmwgfx_drv.o \
> -	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o \
> +	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o ttm_execbuf_util.o \
>  	    vmwgfx_cmd.o vmwgfx_irq.o vmwgfx_ldu.o vmwgfx_ttm_glue.o \
>  	    vmwgfx_overlay.o vmwgfx_gmrid_manager.o vmwgfx_fence.o \
>  	    vmwgfx_bo.o vmwgfx_scrn.o vmwgfx_context.o \
> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> similarity index 99%
> rename from drivers/gpu/drm/ttm/ttm_execbuf_util.c
> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> index dbee34a058df..1030f263ba07 100644
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> @@ -26,13 +26,14 @@
>   *
>   **************************************************************************/
>  
> -#include <drm/ttm/ttm_execbuf_util.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_placement.h>
>  #include <linux/wait.h>
>  #include <linux/sched.h>
>  #include <linux/module.h>
>  
> +#include "ttm_execbuf_util.h"
> +
>  static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
>  					      struct ttm_validate_buffer *entry)
>  {
> diff --git a/include/drm/ttm/ttm_execbuf_util.h b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> similarity index 99%
> rename from include/drm/ttm/ttm_execbuf_util.h
> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> index a99d7fdf2964..47553bf31c73 100644
> --- a/include/drm/ttm/ttm_execbuf_util.h
> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> @@ -33,7 +33,7 @@
>  
>  #include <linux/list.h>
>  
> -#include "ttm_bo_api.h"
> +#include <drm/ttm/ttm_bo_api.h>
>  
>  /**
>   * struct ttm_validate_buffer
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> index be19aa6e1f13..cae306c60af9 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> @@ -37,8 +37,8 @@
>  #include <drm/drm_rect.h>
>  
>  #include <drm/ttm/ttm_bo_driver.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>  
> +#include "ttm_execbuf_util.h"
>  #include "ttm_object.h"
>  
>  #include "vmwgfx_fence.h"
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> index f21df053882b..3613a3d52528 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> @@ -31,7 +31,7 @@
>  #include <linux/list.h>
>  #include <linux/ww_mutex.h>
>  
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include "ttm_execbuf_util.h"
>  
>  #include "vmwgfx_hashtab.h"
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx
@ 2022-05-09 14:33     ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-09 14:33 UTC (permalink / raw)
  To: Christian König; +Cc: daniel, amd-gfx, dri-devel, Christian König

On Wed, May 04, 2022 at 09:47:39AM +0200, Christian König wrote:
> VMWGFX is the only remaining user of this and should probably moved over
> to drm_exec when it starts using GEM as well.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

I guess this is a bit annoying since it means we can't require drm_exec in
ttm eviction, but we can make it an optional pointer in the ttm ctx. Needs
to be optional anyway since we won't roll this out to all drivers, and
then we can optionally use it to handle the locking in eviction instead of
the current lock dropping tricks.

I'm assuming at least that's your goal here, or is there a different one?
-Daniel

> ---
>  drivers/gpu/drm/ttm/Makefile                                  | 4 ++--
>  drivers/gpu/drm/vmwgfx/Makefile                               | 2 +-
>  drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c            | 3 ++-
>  .../drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h     | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h                           | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_validation.h                    | 2 +-
>  6 files changed, 8 insertions(+), 7 deletions(-)
>  rename drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c (99%)
>  rename {include/drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h (99%)
> 
> diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
> index f906b22959cf..b05a8477d0d0 100644
> --- a/drivers/gpu/drm/ttm/Makefile
> +++ b/drivers/gpu/drm/ttm/Makefile
> @@ -3,8 +3,8 @@
>  # Makefile for the drm device driver.  This driver provides support for the
>  
>  ttm-y := ttm_tt.o ttm_bo.o ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
> -	ttm_execbuf_util.o ttm_range_manager.o ttm_resource.o ttm_pool.o \
> -	ttm_device.o ttm_sys_manager.o
> +	ttm_range_manager.o ttm_resource.o ttm_pool.o ttm_device.o \
> +	ttm_sys_manager.o
>  ttm-$(CONFIG_AGP) += ttm_agp_backend.o
>  
>  obj-$(CONFIG_DRM_TTM) += ttm.o
> diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
> index eee73b9aa404..c2c836103b23 100644
> --- a/drivers/gpu/drm/vmwgfx/Makefile
> +++ b/drivers/gpu/drm/vmwgfx/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_hashtab.o vmwgfx_kms.o vmwgfx_drv.o \
> -	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o \
> +	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o ttm_execbuf_util.o \
>  	    vmwgfx_cmd.o vmwgfx_irq.o vmwgfx_ldu.o vmwgfx_ttm_glue.o \
>  	    vmwgfx_overlay.o vmwgfx_gmrid_manager.o vmwgfx_fence.o \
>  	    vmwgfx_bo.o vmwgfx_scrn.o vmwgfx_context.o \
> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> similarity index 99%
> rename from drivers/gpu/drm/ttm/ttm_execbuf_util.c
> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> index dbee34a058df..1030f263ba07 100644
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
> @@ -26,13 +26,14 @@
>   *
>   **************************************************************************/
>  
> -#include <drm/ttm/ttm_execbuf_util.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_placement.h>
>  #include <linux/wait.h>
>  #include <linux/sched.h>
>  #include <linux/module.h>
>  
> +#include "ttm_execbuf_util.h"
> +
>  static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
>  					      struct ttm_validate_buffer *entry)
>  {
> diff --git a/include/drm/ttm/ttm_execbuf_util.h b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> similarity index 99%
> rename from include/drm/ttm/ttm_execbuf_util.h
> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> index a99d7fdf2964..47553bf31c73 100644
> --- a/include/drm/ttm/ttm_execbuf_util.h
> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
> @@ -33,7 +33,7 @@
>  
>  #include <linux/list.h>
>  
> -#include "ttm_bo_api.h"
> +#include <drm/ttm/ttm_bo_api.h>
>  
>  /**
>   * struct ttm_validate_buffer
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> index be19aa6e1f13..cae306c60af9 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> @@ -37,8 +37,8 @@
>  #include <drm/drm_rect.h>
>  
>  #include <drm/ttm/ttm_bo_driver.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>  
> +#include "ttm_execbuf_util.h"
>  #include "ttm_object.h"
>  
>  #include "vmwgfx_fence.h"
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> index f21df053882b..3613a3d52528 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> @@ -31,7 +31,7 @@
>  #include <linux/list.h>
>  #include <linux/ww_mutex.h>
>  
> -#include <drm/ttm/ttm_execbuf_util.h>
> +#include "ttm_execbuf_util.h"
>  
>  #include "vmwgfx_hashtab.h"
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-05-09 14:31     ` Daniel Vetter
  (?)
@ 2022-05-09 15:01     ` Christian König
  2022-05-11 15:23         ` Daniel Vetter
  -1 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2022-05-09 15:01 UTC (permalink / raw)
  To: Daniel Vetter, Christian König; +Cc: amd-gfx, dri-devel

Am 09.05.22 um 16:31 schrieb Daniel Vetter:
> On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
>> [SNIP]
>> +/* Make sure we have enough room and add an object the container */
>> +static int drm_exec_objects_add(struct drm_exec_objects *container,
>> +				struct drm_gem_object *obj)
>> +{
>> +	if (unlikely(container->num_objects == container->max_objects)) {
>> +		size_t size = container->max_objects * sizeof(void *);
>> +		void *tmp;
>> +
>> +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
>> +				GFP_KERNEL);
>> +		if (!tmp)
>> +			return -ENOMEM;
>> +
>> +		container->objects = tmp;
>> +		container->max_objects += PAGE_SIZE / sizeof(void *);
> Might be worth it to inquire the actual allocation size here, since if
> it's kmalloc the generic buckets only cover doubling of sizes, so once
> it's big it goes up a lot quicker than PAGE_SIZE.
>
> But also krealloc checks this internally already so maybe better to not
> break the abstraction.

How can I actually do this? ksize() only works with kmalloc().

Or do we had a function to figure out if vmalloc or kmalloc was used by 
kvrealloc()?

>> [SNIP]
>> +/**
>> + * drm_exec_cleanup - cleanup when contention is detected
>> + * @exec: the drm_exec object to cleanup
>> + *
>> + * Cleanup the current state and return true if we should stay inside the retry
>> + * loop, false if there wasn't any contention detected and we can keep the
>> + * objects locked.
>> + */
>> +bool drm_exec_cleanup(struct drm_exec *exec)
>> +{
>> +	if (likely(!exec->contended)) {
>> +		ww_acquire_done(&exec->ticket);
>> +		return false;
>> +	}
>> +
>> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
>> +		exec->contended = NULL;
>> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> Not sure why this is here instead of in _init()? I thought you're playing
> some really dangerous tricks with re-initting the acquire ctx, which would
> at least be questionable, but does not look like that.

That was my initial design, but the problem with this approach is that 
all locks taken between drm_exec_init() and the loop suddenly have a 
lockdep dependency on reservation_ww_class. And that in turn goes boom 
immediately.

Took me a moment to realize what's wrong with that as well.

> [SNIP]
> +/**
> + * drm_exec_has_duplicates - check for duplicated GEM object
> + * @exec: drm_exec object
> + *
> + * Return true if the drm_exec object has encountered some already locked GEM
> + * objects while trying to lock them. This can happen if multiple GEM objects
> + * share the same underlying resv object.
> + */
> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> +{
> +	return exec->duplicates.num_objects > 0;
> Definitely an aside, but in our i915 efforts to get rid of temporary pins
> we run into some fun where the eviction code couldn't differentiate from
> memory we need reserved for the CS and memory we just keep locked because
> we evicted it and fun stuff like that. So maybe we need a bit more
> tracking here eventually, but that's only when we have this somehow glued
> into ttm eviction code.

Hehe, yeah that's what I was thinking about as well. But then I though 
one step at a time.

> Also the even more massive step would be to glue this into dma-buf so you
> can do dynamic dma-buf eviction and still keep track of all the buffers. I
> think with some clever pointer tagging and a bit more indirection we could
> nest drm_exec structures (so that a driver could insert it's entire
> drm_exec structure with a drm_exec-level callback for handling refcounting
> and stuff like that).

I considered in which component to put this quite a bit as well, but 
then intentionally decided against DMA-buf.

One major reason was that not all buffers which needs to be locked this 
way are actually exported as DMA-buf.

Another reason is that DMA-buf doesn't necessary need a concept of an 
execution context. As far as I can see that's something GPU/DRM driver 
specific.

> So anyway I think this all looks good, just one more thing before I think
> we should land this:
>
> gem helpers in drm_gem_lock_reservations() has something which is
> practically compatible already, except that you bulk-add the entire set of
> objects. I think if you add a bulk-prepare function then we could also
> replace all those. Maybe even nicer if the bulk-prepare takes the array of
> handles and does the handle lookup too, but at least something which can
> subsititue drm_gem_lock_reservations with drm_exec would be nice to
> validate the helpers a bit more and really make sure we only have one of
> them left.

I was considering that as well, but then also thought one step at a 
time. Not sure if it's possible to look up handles without running into 
some locking fun, thought.

Thanks for the review,
Christian.

>
> Thoughts?
> -Daniel
>
>> +}
>> +
>> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
>> +void drm_exec_fini(struct drm_exec *exec);
>> +bool drm_exec_cleanup(struct drm_exec *exec);
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
>> +			 unsigned int num_fences);
>> +
>> +#endif
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx
  2022-05-09 14:33     ` Daniel Vetter
  (?)
@ 2022-05-09 15:02     ` Christian König
  -1 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-09 15:02 UTC (permalink / raw)
  To: Daniel Vetter, Christian König; +Cc: amd-gfx, dri-devel

Am 09.05.22 um 16:33 schrieb Daniel Vetter:
> On Wed, May 04, 2022 at 09:47:39AM +0200, Christian König wrote:
>> VMWGFX is the only remaining user of this and should probably moved over
>> to drm_exec when it starts using GEM as well.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> I guess this is a bit annoying since it means we can't require drm_exec in
> ttm eviction, but we can make it an optional pointer in the ttm ctx. Needs
> to be optional anyway since we won't roll this out to all drivers, and
> then we can optionally use it to handle the locking in eviction instead of
> the current lock dropping tricks.
>
> I'm assuming at least that's your goal here, or is there a different one?

Yes, exactly that's the long term plan.

Christian.

> -Daniel
>
>> ---
>>   drivers/gpu/drm/ttm/Makefile                                  | 4 ++--
>>   drivers/gpu/drm/vmwgfx/Makefile                               | 2 +-
>>   drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c            | 3 ++-
>>   .../drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h     | 2 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.h                           | 2 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_validation.h                    | 2 +-
>>   6 files changed, 8 insertions(+), 7 deletions(-)
>>   rename drivers/gpu/drm/{ttm => vmwgfx}/ttm_execbuf_util.c (99%)
>>   rename {include/drm/ttm => drivers/gpu/drm/vmwgfx}/ttm_execbuf_util.h (99%)
>>
>> diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
>> index f906b22959cf..b05a8477d0d0 100644
>> --- a/drivers/gpu/drm/ttm/Makefile
>> +++ b/drivers/gpu/drm/ttm/Makefile
>> @@ -3,8 +3,8 @@
>>   # Makefile for the drm device driver.  This driver provides support for the
>>   
>>   ttm-y := ttm_tt.o ttm_bo.o ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
>> -	ttm_execbuf_util.o ttm_range_manager.o ttm_resource.o ttm_pool.o \
>> -	ttm_device.o ttm_sys_manager.o
>> +	ttm_range_manager.o ttm_resource.o ttm_pool.o ttm_device.o \
>> +	ttm_sys_manager.o
>>   ttm-$(CONFIG_AGP) += ttm_agp_backend.o
>>   
>>   obj-$(CONFIG_DRM_TTM) += ttm.o
>> diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
>> index eee73b9aa404..c2c836103b23 100644
>> --- a/drivers/gpu/drm/vmwgfx/Makefile
>> +++ b/drivers/gpu/drm/vmwgfx/Makefile
>> @@ -1,6 +1,6 @@
>>   # SPDX-License-Identifier: GPL-2.0
>>   vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_hashtab.o vmwgfx_kms.o vmwgfx_drv.o \
>> -	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o \
>> +	    vmwgfx_ioctl.o vmwgfx_resource.o vmwgfx_ttm_buffer.o ttm_execbuf_util.o \
>>   	    vmwgfx_cmd.o vmwgfx_irq.o vmwgfx_ldu.o vmwgfx_ttm_glue.o \
>>   	    vmwgfx_overlay.o vmwgfx_gmrid_manager.o vmwgfx_fence.o \
>>   	    vmwgfx_bo.o vmwgfx_scrn.o vmwgfx_context.o \
>> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
>> similarity index 99%
>> rename from drivers/gpu/drm/ttm/ttm_execbuf_util.c
>> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
>> index dbee34a058df..1030f263ba07 100644
>> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
>> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.c
>> @@ -26,13 +26,14 @@
>>    *
>>    **************************************************************************/
>>   
>> -#include <drm/ttm/ttm_execbuf_util.h>
>>   #include <drm/ttm/ttm_bo_driver.h>
>>   #include <drm/ttm/ttm_placement.h>
>>   #include <linux/wait.h>
>>   #include <linux/sched.h>
>>   #include <linux/module.h>
>>   
>> +#include "ttm_execbuf_util.h"
>> +
>>   static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
>>   					      struct ttm_validate_buffer *entry)
>>   {
>> diff --git a/include/drm/ttm/ttm_execbuf_util.h b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
>> similarity index 99%
>> rename from include/drm/ttm/ttm_execbuf_util.h
>> rename to drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
>> index a99d7fdf2964..47553bf31c73 100644
>> --- a/include/drm/ttm/ttm_execbuf_util.h
>> +++ b/drivers/gpu/drm/vmwgfx/ttm_execbuf_util.h
>> @@ -33,7 +33,7 @@
>>   
>>   #include <linux/list.h>
>>   
>> -#include "ttm_bo_api.h"
>> +#include <drm/ttm/ttm_bo_api.h>
>>   
>>   /**
>>    * struct ttm_validate_buffer
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
>> index be19aa6e1f13..cae306c60af9 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
>> @@ -37,8 +37,8 @@
>>   #include <drm/drm_rect.h>
>>   
>>   #include <drm/ttm/ttm_bo_driver.h>
>> -#include <drm/ttm/ttm_execbuf_util.h>
>>   
>> +#include "ttm_execbuf_util.h"
>>   #include "ttm_object.h"
>>   
>>   #include "vmwgfx_fence.h"
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
>> index f21df053882b..3613a3d52528 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
>> @@ -31,7 +31,7 @@
>>   #include <linux/list.h>
>>   #include <linux/ww_mutex.h>
>>   
>> -#include <drm/ttm/ttm_execbuf_util.h>
>> +#include "ttm_execbuf_util.h"
>>   
>>   #include "vmwgfx_hashtab.h"
>>   
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-05-09 15:01     ` Christian König
@ 2022-05-11 15:23         ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-11 15:23 UTC (permalink / raw)
  To: Christian König; +Cc: Christian König, amd-gfx, dri-devel

On Mon, May 09, 2022 at 05:01:33PM +0200, Christian König wrote:
> Am 09.05.22 um 16:31 schrieb Daniel Vetter:
> > On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
> > > [SNIP]
> > > +/* Make sure we have enough room and add an object the container */
> > > +static int drm_exec_objects_add(struct drm_exec_objects *container,
> > > +				struct drm_gem_object *obj)
> > > +{
> > > +	if (unlikely(container->num_objects == container->max_objects)) {
> > > +		size_t size = container->max_objects * sizeof(void *);
> > > +		void *tmp;
> > > +
> > > +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> > > +				GFP_KERNEL);
> > > +		if (!tmp)
> > > +			return -ENOMEM;
> > > +
> > > +		container->objects = tmp;
> > > +		container->max_objects += PAGE_SIZE / sizeof(void *);
> > Might be worth it to inquire the actual allocation size here, since if
> > it's kmalloc the generic buckets only cover doubling of sizes, so once
> > it's big it goes up a lot quicker than PAGE_SIZE.
> > 
> > But also krealloc checks this internally already so maybe better to not
> > break the abstraction.
> 
> How can I actually do this? ksize() only works with kmalloc().
> 
> Or do we had a function to figure out if vmalloc or kmalloc was used by
> kvrealloc()?

kvfree has a is_vmalloc_addr so it would boil down to open-code that a
bit.

Probably not worth the trouble really, otoh looking at kvrealloc it
doesn't use krealloc underneath, so it's not doing that check. Maybe we
should just push that check into kvrealloc for the !vmalloc_addr case.

> > > [SNIP]
> > > +/**
> > > + * drm_exec_cleanup - cleanup when contention is detected
> > > + * @exec: the drm_exec object to cleanup
> > > + *
> > > + * Cleanup the current state and return true if we should stay inside the retry
> > > + * loop, false if there wasn't any contention detected and we can keep the
> > > + * objects locked.
> > > + */
> > > +bool drm_exec_cleanup(struct drm_exec *exec)
> > > +{
> > > +	if (likely(!exec->contended)) {
> > > +		ww_acquire_done(&exec->ticket);
> > > +		return false;
> > > +	}
> > > +
> > > +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> > > +		exec->contended = NULL;
> > > +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> > Not sure why this is here instead of in _init()? I thought you're playing
> > some really dangerous tricks with re-initting the acquire ctx, which would
> > at least be questionable, but does not look like that.
> 
> That was my initial design, but the problem with this approach is that all
> locks taken between drm_exec_init() and the loop suddenly have a lockdep
> dependency on reservation_ww_class. And that in turn goes boom immediately.
> 
> Took me a moment to realize what's wrong with that as well.

Uh crap, indeed. I think minimally this needs to be document, but
personally I'm leaning towards drm_exec_prepare_init(), which does this
explicitly.

I do agree we need this split, especially so we can eventually add helpers
for bo lookup, or maybe userptr/hmm prep and things like that, which all
has to be outside of the acquire_ctx.

> > [SNIP]
> > +/**
> > + * drm_exec_has_duplicates - check for duplicated GEM object
> > + * @exec: drm_exec object
> > + *
> > + * Return true if the drm_exec object has encountered some already locked GEM
> > + * objects while trying to lock them. This can happen if multiple GEM objects
> > + * share the same underlying resv object.
> > + */
> > +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> > +{
> > +	return exec->duplicates.num_objects > 0;
> > Definitely an aside, but in our i915 efforts to get rid of temporary pins
> > we run into some fun where the eviction code couldn't differentiate from
> > memory we need reserved for the CS and memory we just keep locked because
> > we evicted it and fun stuff like that. So maybe we need a bit more
> > tracking here eventually, but that's only when we have this somehow glued
> > into ttm eviction code.
> 
> Hehe, yeah that's what I was thinking about as well. But then I though one
> step at a time.
> 
> > Also the even more massive step would be to glue this into dma-buf so you
> > can do dynamic dma-buf eviction and still keep track of all the buffers. I
> > think with some clever pointer tagging and a bit more indirection we could
> > nest drm_exec structures (so that a driver could insert it's entire
> > drm_exec structure with a drm_exec-level callback for handling refcounting
> > and stuff like that).
> 
> I considered in which component to put this quite a bit as well, but then
> intentionally decided against DMA-buf.
> 
> One major reason was that not all buffers which needs to be locked this way
> are actually exported as DMA-buf.
> 
> Another reason is that DMA-buf doesn't necessary need a concept of an
> execution context. As far as I can see that's something GPU/DRM driver
> specific.

Yeah I think putting this into driver subsystem is right. I just wanted to
point that even with that driver subsystem design we can still pretty
easily put this into dma-buf eventually. And still have the benefit that
each driver would have a structure which operates on the native buffer
object.

> > So anyway I think this all looks good, just one more thing before I think
> > we should land this:
> > 
> > gem helpers in drm_gem_lock_reservations() has something which is
> > practically compatible already, except that you bulk-add the entire set of
> > objects. I think if you add a bulk-prepare function then we could also
> > replace all those. Maybe even nicer if the bulk-prepare takes the array of
> > handles and does the handle lookup too, but at least something which can
> > subsititue drm_gem_lock_reservations with drm_exec would be nice to
> > validate the helpers a bit more and really make sure we only have one of
> > them left.
> 
> I was considering that as well, but then also thought one step at a time.
> Not sure if it's possible to look up handles without running into some
> locking fun, thought.

Since you pointed out the acquire_ctx fun I think that's really the only
issue. It's the combo of e.g. v3d_lookup_bos() and
v3d_lock_bo_reservations().
-Daniel


> Thanks for the review,
> Christian.
> 
> > 
> > Thoughts?
> > -Daniel
> > 
> > > +}
> > > +
> > > +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> > > +void drm_exec_fini(struct drm_exec *exec);
> > > +bool drm_exec_cleanup(struct drm_exec *exec);
> > > +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> > > +			 unsigned int num_fences);
> > > +
> > > +#endif
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
@ 2022-05-11 15:23         ` Daniel Vetter
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-05-11 15:23 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, amd-gfx, dri-devel, Daniel Vetter

On Mon, May 09, 2022 at 05:01:33PM +0200, Christian König wrote:
> Am 09.05.22 um 16:31 schrieb Daniel Vetter:
> > On Wed, May 04, 2022 at 09:47:32AM +0200, Christian König wrote:
> > > [SNIP]
> > > +/* Make sure we have enough room and add an object the container */
> > > +static int drm_exec_objects_add(struct drm_exec_objects *container,
> > > +				struct drm_gem_object *obj)
> > > +{
> > > +	if (unlikely(container->num_objects == container->max_objects)) {
> > > +		size_t size = container->max_objects * sizeof(void *);
> > > +		void *tmp;
> > > +
> > > +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> > > +				GFP_KERNEL);
> > > +		if (!tmp)
> > > +			return -ENOMEM;
> > > +
> > > +		container->objects = tmp;
> > > +		container->max_objects += PAGE_SIZE / sizeof(void *);
> > Might be worth it to inquire the actual allocation size here, since if
> > it's kmalloc the generic buckets only cover doubling of sizes, so once
> > it's big it goes up a lot quicker than PAGE_SIZE.
> > 
> > But also krealloc checks this internally already so maybe better to not
> > break the abstraction.
> 
> How can I actually do this? ksize() only works with kmalloc().
> 
> Or do we had a function to figure out if vmalloc or kmalloc was used by
> kvrealloc()?

kvfree has a is_vmalloc_addr so it would boil down to open-code that a
bit.

Probably not worth the trouble really, otoh looking at kvrealloc it
doesn't use krealloc underneath, so it's not doing that check. Maybe we
should just push that check into kvrealloc for the !vmalloc_addr case.

> > > [SNIP]
> > > +/**
> > > + * drm_exec_cleanup - cleanup when contention is detected
> > > + * @exec: the drm_exec object to cleanup
> > > + *
> > > + * Cleanup the current state and return true if we should stay inside the retry
> > > + * loop, false if there wasn't any contention detected and we can keep the
> > > + * objects locked.
> > > + */
> > > +bool drm_exec_cleanup(struct drm_exec *exec)
> > > +{
> > > +	if (likely(!exec->contended)) {
> > > +		ww_acquire_done(&exec->ticket);
> > > +		return false;
> > > +	}
> > > +
> > > +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> > > +		exec->contended = NULL;
> > > +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> > Not sure why this is here instead of in _init()? I thought you're playing
> > some really dangerous tricks with re-initting the acquire ctx, which would
> > at least be questionable, but does not look like that.
> 
> That was my initial design, but the problem with this approach is that all
> locks taken between drm_exec_init() and the loop suddenly have a lockdep
> dependency on reservation_ww_class. And that in turn goes boom immediately.
> 
> Took me a moment to realize what's wrong with that as well.

Uh crap, indeed. I think minimally this needs to be document, but
personally I'm leaning towards drm_exec_prepare_init(), which does this
explicitly.

I do agree we need this split, especially so we can eventually add helpers
for bo lookup, or maybe userptr/hmm prep and things like that, which all
has to be outside of the acquire_ctx.

> > [SNIP]
> > +/**
> > + * drm_exec_has_duplicates - check for duplicated GEM object
> > + * @exec: drm_exec object
> > + *
> > + * Return true if the drm_exec object has encountered some already locked GEM
> > + * objects while trying to lock them. This can happen if multiple GEM objects
> > + * share the same underlying resv object.
> > + */
> > +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> > +{
> > +	return exec->duplicates.num_objects > 0;
> > Definitely an aside, but in our i915 efforts to get rid of temporary pins
> > we run into some fun where the eviction code couldn't differentiate from
> > memory we need reserved for the CS and memory we just keep locked because
> > we evicted it and fun stuff like that. So maybe we need a bit more
> > tracking here eventually, but that's only when we have this somehow glued
> > into ttm eviction code.
> 
> Hehe, yeah that's what I was thinking about as well. But then I though one
> step at a time.
> 
> > Also the even more massive step would be to glue this into dma-buf so you
> > can do dynamic dma-buf eviction and still keep track of all the buffers. I
> > think with some clever pointer tagging and a bit more indirection we could
> > nest drm_exec structures (so that a driver could insert it's entire
> > drm_exec structure with a drm_exec-level callback for handling refcounting
> > and stuff like that).
> 
> I considered in which component to put this quite a bit as well, but then
> intentionally decided against DMA-buf.
> 
> One major reason was that not all buffers which needs to be locked this way
> are actually exported as DMA-buf.
> 
> Another reason is that DMA-buf doesn't necessary need a concept of an
> execution context. As far as I can see that's something GPU/DRM driver
> specific.

Yeah I think putting this into driver subsystem is right. I just wanted to
point that even with that driver subsystem design we can still pretty
easily put this into dma-buf eventually. And still have the benefit that
each driver would have a structure which operates on the native buffer
object.

> > So anyway I think this all looks good, just one more thing before I think
> > we should land this:
> > 
> > gem helpers in drm_gem_lock_reservations() has something which is
> > practically compatible already, except that you bulk-add the entire set of
> > objects. I think if you add a bulk-prepare function then we could also
> > replace all those. Maybe even nicer if the bulk-prepare takes the array of
> > handles and does the handle lookup too, but at least something which can
> > subsititue drm_gem_lock_reservations with drm_exec would be nice to
> > validate the helpers a bit more and really make sure we only have one of
> > them left.
> 
> I was considering that as well, but then also thought one step at a time.
> Not sure if it's possible to look up handles without running into some
> locking fun, thought.

Since you pointed out the acquire_ctx fun I think that's really the only
issue. It's the combo of e.g. v3d_lookup_bos() and
v3d_lock_bo_reservations().
-Daniel


> Thanks for the review,
> Christian.
> 
> > 
> > Thoughts?
> > -Daniel
> > 
> > > +}
> > > +
> > > +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> > > +void drm_exec_fini(struct drm_exec *exec);
> > > +bool drm_exec_cleanup(struct drm_exec *exec);
> > > +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> > > +			 unsigned int num_fences);
> > > +
> > > +#endif
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] drm/amdkfd: switch over to using drm_exec
  2022-05-04  7:47 ` [PATCH 3/8] drm/amdkfd: switch over to using drm_exec Christian König
@ 2022-06-09  0:04   ` Felix Kuehling
  0 siblings, 0 replies; 22+ messages in thread
From: Felix Kuehling @ 2022-06-09  0:04 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx, daniel; +Cc: Christian König


On 2022-05-04 03:47, Christian König wrote:
> Avoids quite a bit of logic and kmalloc overhead.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   5 +-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 303 +++++++-----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  14 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   3 +
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  32 +-
>   5 files changed, 152 insertions(+), 205 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 4cb14c2fe53f..3f3a994c68e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -25,12 +25,12 @@
>   #ifndef AMDGPU_AMDKFD_H_INCLUDED
>   #define AMDGPU_AMDKFD_H_INCLUDED
>   
> +#include <linux/list.h>
>   #include <linux/types.h>
>   #include <linux/mm.h>
>   #include <linux/kthread.h>
>   #include <linux/workqueue.h>
>   #include <kgd_kfd_interface.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>   #include "amdgpu_sync.h"
>   #include "amdgpu_vm.h"
>   
> @@ -66,8 +66,7 @@ struct kgd_mem {
>   	struct dma_buf *dmabuf;
>   	struct list_head attachments;
>   	/* protected by amdkfd_process_info.lock */
> -	struct ttm_validate_buffer validate_list;
> -	struct ttm_validate_buffer resv_list;
> +	struct list_head validate_list;
>   	uint32_t domain;
>   	unsigned int mapped_to_gpu_memory;
>   	uint64_t va;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 3dc5ab2764ff..64ac4f8f49be 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -25,6 +25,8 @@
>   #include <linux/sched/mm.h>
>   #include <linux/sched/task.h>
>   
> +#include <drm/drm_exec.h>
> +
>   #include "amdgpu_object.h"
>   #include "amdgpu_gem.h"
>   #include "amdgpu_vm.h"
> @@ -770,28 +772,19 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
>   				struct amdkfd_process_info *process_info,
>   				bool userptr)
>   {
> -	struct ttm_validate_buffer *entry = &mem->validate_list;
> -	struct amdgpu_bo *bo = mem->bo;
> -
> -	INIT_LIST_HEAD(&entry->head);
> -	entry->num_shared = 1;
> -	entry->bo = &bo->tbo;
> -	mutex_lock(&process_info->lock);

You removed mutex_lock, but left mutex_unlock below. Other than that, 
this patch looks reasonable. But my eyes may have glazed over with this 
much churn.

Regards,
   Felix


>   	if (userptr)
> -		list_add_tail(&entry->head, &process_info->userptr_valid_list);
> +		list_add_tail(&mem->validate_list,
> +			      &process_info->userptr_valid_list);
>   	else
> -		list_add_tail(&entry->head, &process_info->kfd_bo_list);
> +		list_add_tail(&mem->validate_list, &process_info->kfd_bo_list);
>   	mutex_unlock(&process_info->lock);
>   }
>   
>   static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem *mem,
>   		struct amdkfd_process_info *process_info)
>   {
> -	struct ttm_validate_buffer *bo_list_entry;
> -
> -	bo_list_entry = &mem->validate_list;
>   	mutex_lock(&process_info->lock);
> -	list_del(&bo_list_entry->head);
> +	list_del(&mem->validate_list);
>   	mutex_unlock(&process_info->lock);
>   }
>   
> @@ -875,13 +868,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
>    * object can track VM updates.
>    */
>   struct bo_vm_reservation_context {
> -	struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
> -	unsigned int n_vms;		    /* Number of VMs reserved	    */
> -	struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
> -	struct ww_acquire_ctx ticket;	    /* Reservation ticket	    */
> -	struct list_head list, duplicates;  /* BO lists			    */
> -	struct amdgpu_sync *sync;	    /* Pointer to sync object	    */
> -	bool reserved;			    /* Whether BOs are reserved	    */
> +	/* DRM execution context for the reservation */
> +	struct drm_exec exec;
> +	/* Number of VMs reserved */
> +	unsigned int n_vms;
> +	/* Pointer to sync object */
> +	struct amdgpu_sync *sync;
>   };
>   
>   enum bo_vm_match {
> @@ -905,35 +897,24 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
>   
>   	WARN_ON(!vm);
>   
> -	ctx->reserved = false;
>   	ctx->n_vms = 1;
>   	ctx->sync = &mem->sync;
> -
> -	INIT_LIST_HEAD(&ctx->list);
> -	INIT_LIST_HEAD(&ctx->duplicates);
> -
> -	ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
> -	if (!ctx->vm_pd)
> -		return -ENOMEM;
> -
> -	ctx->kfd_bo.priority = 0;
> -	ctx->kfd_bo.tv.bo = &bo->tbo;
> -	ctx->kfd_bo.tv.num_shared = 1;
> -	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
> -
> -	amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
> -
> -	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
> -				     false, &ctx->duplicates);
> -	if (ret) {
> -		pr_err("Failed to reserve buffers in ttm.\n");
> -		kfree(ctx->vm_pd);
> -		ctx->vm_pd = NULL;
> -		return ret;
> +	drm_exec_init(&ctx->exec, true);
> +	drm_exec_while_not_all_locked(&ctx->exec) {
> +		ret = amdgpu_vm_lock_pd(vm, &ctx->exec);
> +		if (likely(!ret))
> +			ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base,
> +						   0);
> +		drm_exec_continue_on_contention(&ctx->exec);
> +		if (unlikely(ret))
> +			goto error;
>   	}
> -
> -	ctx->reserved = true;
>   	return 0;
> +
> +error:
> +	pr_err("Failed to reserve buffers in ttm.\n");
> +	drm_exec_fini(&ctx->exec);
> +	return ret;
>   }
>   
>   /**
> @@ -950,63 +931,39 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
>   				struct amdgpu_vm *vm, enum bo_vm_match map_type,
>   				struct bo_vm_reservation_context *ctx)
>   {
> -	struct amdgpu_bo *bo = mem->bo;
>   	struct kfd_mem_attachment *entry;
> -	unsigned int i;
> +	struct amdgpu_bo *bo = mem->bo;
>   	int ret;
>   
> -	ctx->reserved = false;
> -	ctx->n_vms = 0;
> -	ctx->vm_pd = NULL;
>   	ctx->sync = &mem->sync;
> +	drm_exec_init(&ctx->exec, true);
> +	drm_exec_while_not_all_locked(&ctx->exec) {
> +		ctx->n_vms = 0;
> +		list_for_each_entry(entry, &mem->attachments, list) {
> +			if ((vm && vm != entry->bo_va->base.vm) ||
> +				(entry->is_mapped != map_type
> +				&& map_type != BO_VM_ALL))
> +				continue;
>   
> -	INIT_LIST_HEAD(&ctx->list);
> -	INIT_LIST_HEAD(&ctx->duplicates);
> -
> -	list_for_each_entry(entry, &mem->attachments, list) {
> -		if ((vm && vm != entry->bo_va->base.vm) ||
> -			(entry->is_mapped != map_type
> -			&& map_type != BO_VM_ALL))
> -			continue;
> -
> -		ctx->n_vms++;
> -	}
> -
> -	if (ctx->n_vms != 0) {
> -		ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
> -				     GFP_KERNEL);
> -		if (!ctx->vm_pd)
> -			return -ENOMEM;
> -	}
> -
> -	ctx->kfd_bo.priority = 0;
> -	ctx->kfd_bo.tv.bo = &bo->tbo;
> -	ctx->kfd_bo.tv.num_shared = 1;
> -	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
> -
> -	i = 0;
> -	list_for_each_entry(entry, &mem->attachments, list) {
> -		if ((vm && vm != entry->bo_va->base.vm) ||
> -			(entry->is_mapped != map_type
> -			&& map_type != BO_VM_ALL))
> -			continue;
> -
> -		amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
> -				&ctx->vm_pd[i]);
> -		i++;
> -	}
> +			ret = amdgpu_vm_lock_pd(vm, &ctx->exec);
> +			drm_exec_break_on_contention(&ctx->exec);
> +			if (unlikely(ret))
> +				goto error;
> +			++ctx->n_vms;
> +		}
> +		drm_exec_continue_on_contention(&ctx->exec);
>   
> -	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
> -				     false, &ctx->duplicates);
> -	if (ret) {
> -		pr_err("Failed to reserve buffers in ttm.\n");
> -		kfree(ctx->vm_pd);
> -		ctx->vm_pd = NULL;
> -		return ret;
> +		ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base, 1);
> +		drm_exec_continue_on_contention(&ctx->exec);
> +		if (unlikely(ret))
> +			goto error;
>   	}
> -
> -	ctx->reserved = true;
>   	return 0;
> +
> +error:
> +	pr_err("Failed to reserve buffers in ttm.\n");
> +	drm_exec_fini(&ctx->exec);
> +	return ret;
>   }
>   
>   /**
> @@ -1027,15 +984,8 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
>   	if (wait)
>   		ret = amdgpu_sync_wait(ctx->sync, intr);
>   
> -	if (ctx->reserved)
> -		ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
> -	kfree(ctx->vm_pd);
> -
> +	drm_exec_fini(&ctx->exec);
>   	ctx->sync = NULL;
> -
> -	ctx->reserved = false;
> -	ctx->vm_pd = NULL;
> -
>   	return ret;
>   }
>   
> @@ -1616,7 +1566,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   	unsigned long bo_size = mem->bo->tbo.base.size;
>   	struct kfd_mem_attachment *entry, *tmp;
>   	struct bo_vm_reservation_context ctx;
> -	struct ttm_validate_buffer *bo_list_entry;
>   	unsigned int mapped_to_gpu_memory;
>   	int ret;
>   	bool is_imported = false;
> @@ -1644,9 +1593,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   	}
>   
>   	/* Make sure restore workers don't access the BO any more */
> -	bo_list_entry = &mem->validate_list;
>   	mutex_lock(&process_info->lock);
> -	list_del(&bo_list_entry->head);
> +	list_del(&mem->validate_list);
>   	mutex_unlock(&process_info->lock);
>   
>   	/* No more MMU notifiers */
> @@ -1945,7 +1893,7 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
>   
>   	amdgpu_amdkfd_remove_eviction_fence(
>   		bo, mem->process_info->eviction_fence);
> -	list_del_init(&mem->validate_list.head);
> +	list_del_init(&mem->validate_list);
>   
>   	if (size)
>   		*size = amdgpu_bo_size(bo);
> @@ -2107,7 +2055,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>   	 */
>   	list_for_each_entry_safe(mem, tmp_mem,
>   				 &process_info->userptr_valid_list,
> -				 validate_list.head) {
> +				 validate_list) {
>   		if (!atomic_read(&mem->invalid))
>   			continue; /* BO is still valid */
>   
> @@ -2124,7 +2072,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>   			return -EAGAIN;
>   		}
>   
> -		list_move_tail(&mem->validate_list.head,
> +		list_move_tail(&mem->validate_list,
>   			       &process_info->userptr_inval_list);
>   	}
>   
> @@ -2133,7 +2081,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>   
>   	/* Go through userptr_inval_list and update any invalid user_pages */
>   	list_for_each_entry(mem, &process_info->userptr_inval_list,
> -			    validate_list.head) {
> +			    validate_list) {
>   		invalid = atomic_read(&mem->invalid);
>   		if (!invalid)
>   			/* BO hasn't been invalidated since the last
> @@ -2184,50 +2132,44 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>    */
>   static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>   {
> -	struct amdgpu_bo_list_entry *pd_bo_list_entries;
> -	struct list_head resv_list, duplicates;
> -	struct ww_acquire_ctx ticket;
> +	struct ttm_operation_ctx ctx = { false, false };
>   	struct amdgpu_sync sync;
> +	struct drm_exec exec;
>   
>   	struct amdgpu_vm *peer_vm;
>   	struct kgd_mem *mem, *tmp_mem;
>   	struct amdgpu_bo *bo;
> -	struct ttm_operation_ctx ctx = { false, false };
> -	int i, ret;
> -
> -	pd_bo_list_entries = kcalloc(process_info->n_vms,
> -				     sizeof(struct amdgpu_bo_list_entry),
> -				     GFP_KERNEL);
> -	if (!pd_bo_list_entries) {
> -		pr_err("%s: Failed to allocate PD BO list entries\n", __func__);
> -		ret = -ENOMEM;
> -		goto out_no_mem;
> -	}
> -
> -	INIT_LIST_HEAD(&resv_list);
> -	INIT_LIST_HEAD(&duplicates);
> +	int ret;
>   
> -	/* Get all the page directory BOs that need to be reserved */
> -	i = 0;
> -	list_for_each_entry(peer_vm, &process_info->vm_list_head,
> -			    vm_list_node)
> -		amdgpu_vm_get_pd_bo(peer_vm, &resv_list,
> -				    &pd_bo_list_entries[i++]);
> -	/* Add the userptr_inval_list entries to resv_list */
> -	list_for_each_entry(mem, &process_info->userptr_inval_list,
> -			    validate_list.head) {
> -		list_add_tail(&mem->resv_list.head, &resv_list);
> -		mem->resv_list.bo = mem->validate_list.bo;
> -		mem->resv_list.num_shared = mem->validate_list.num_shared;
> -	}
> +	amdgpu_sync_create(&sync);
>   
> +	drm_exec_init(&exec, true);
>   	/* Reserve all BOs and page tables for validation */
> -	ret = ttm_eu_reserve_buffers(&ticket, &resv_list, false, &duplicates);
> -	WARN(!list_empty(&duplicates), "Duplicates should be empty");
> -	if (ret)
> -		goto out_free;
> +	drm_exec_while_not_all_locked(&exec) {
> +		/* Reserve all the page directories */
> +		list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +				    vm_list_node) {
> +			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
> +			drm_exec_break_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto unreserve_out;
> +		}
> +		drm_exec_continue_on_contention(&exec);
>   
> -	amdgpu_sync_create(&sync);
> +		/* Reserve the userptr_inval_list entries to resv_list */
> +		list_for_each_entry(mem, &process_info->userptr_inval_list,
> +				    validate_list) {
> +			struct drm_gem_object *gobj;
> +
> +			gobj = &mem->bo->tbo.base;
> +			ret = drm_exec_prepare_obj(&exec, gobj, 1);
> +			drm_exec_break_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto unreserve_out;
> +		}
> +		drm_exec_continue_on_contention(&exec);
> +	}
> +	WARN(!drm_exec_has_duplicates(&exec), "Duplicates should be empty");
>   
>   	ret = process_validate_vms(process_info);
>   	if (ret)
> @@ -2236,7 +2178,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>   	/* Validate BOs and update GPUVM page tables */
>   	list_for_each_entry_safe(mem, tmp_mem,
>   				 &process_info->userptr_inval_list,
> -				 validate_list.head) {
> +				 validate_list) {
>   		struct kfd_mem_attachment *attachment;
>   
>   		bo = mem->bo;
> @@ -2251,7 +2193,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>   			}
>   		}
>   
> -		list_move_tail(&mem->validate_list.head,
> +		list_move_tail(&mem->validate_list,
>   			       &process_info->userptr_valid_list);
>   
>   		/* Update mapping. If the BO was not validated
> @@ -2279,12 +2221,9 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>   	ret = process_update_pds(process_info, &sync);
>   
>   unreserve_out:
> -	ttm_eu_backoff_reservation(&ticket, &resv_list);
> +	drm_exec_fini(&exec);
>   	amdgpu_sync_wait(&sync, false);
>   	amdgpu_sync_free(&sync);
> -out_free:
> -	kfree(pd_bo_list_entries);
> -out_no_mem:
>   
>   	return ret;
>   }
> @@ -2381,50 +2320,46 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>    */
>   int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   {
> -	struct amdgpu_bo_list_entry *pd_bo_list;
>   	struct amdkfd_process_info *process_info = info;
>   	struct amdgpu_vm *peer_vm;
>   	struct kgd_mem *mem;
> -	struct bo_vm_reservation_context ctx;
>   	struct amdgpu_amdkfd_fence *new_fence;
> -	int ret = 0, i;
>   	struct list_head duplicate_save;
>   	struct amdgpu_sync sync_obj;
>   	unsigned long failed_size = 0;
>   	unsigned long total_size = 0;
> +	struct drm_exec exec;
> +	int ret;
>   
>   	INIT_LIST_HEAD(&duplicate_save);
> -	INIT_LIST_HEAD(&ctx.list);
> -	INIT_LIST_HEAD(&ctx.duplicates);
>   
> -	pd_bo_list = kcalloc(process_info->n_vms,
> -			     sizeof(struct amdgpu_bo_list_entry),
> -			     GFP_KERNEL);
> -	if (!pd_bo_list)
> -		return -ENOMEM;
> -
> -	i = 0;
>   	mutex_lock(&process_info->lock);
> -	list_for_each_entry(peer_vm, &process_info->vm_list_head,
> -			vm_list_node)
> -		amdgpu_vm_get_pd_bo(peer_vm, &ctx.list, &pd_bo_list[i++]);
>   
> -	/* Reserve all BOs and page tables/directory. Add all BOs from
> -	 * kfd_bo_list to ctx.list
> -	 */
> -	list_for_each_entry(mem, &process_info->kfd_bo_list,
> -			    validate_list.head) {
> -
> -		list_add_tail(&mem->resv_list.head, &ctx.list);
> -		mem->resv_list.bo = mem->validate_list.bo;
> -		mem->resv_list.num_shared = mem->validate_list.num_shared;
> -	}
> +	drm_exec_init(&exec, false);
> +	drm_exec_while_not_all_locked(&exec) {
> +		list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +				    vm_list_node) {
> +			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
> +			drm_exec_break_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto ttm_reserve_fail;
> +		}
> +		drm_exec_continue_on_contention(&exec);
>   
> -	ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
> -				     false, &duplicate_save);
> -	if (ret) {
> -		pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
> -		goto ttm_reserve_fail;
> +		/* Reserve all BOs and page tables/directory. Add all BOs from
> +		 * kfd_bo_list to ctx.list
> +		 */
> +		list_for_each_entry(mem, &process_info->kfd_bo_list,
> +				    validate_list) {
> +			struct drm_gem_object *gobj;
> +
> +			gobj = &mem->bo->tbo.base;
> +			ret = drm_exec_prepare_obj(&exec, gobj, 1);
> +			drm_exec_break_on_contention(&exec);
> +			if (unlikely(ret))
> +				goto ttm_reserve_fail;
> +		}
> +		drm_exec_continue_on_contention(&exec);
>   	}
>   
>   	amdgpu_sync_create(&sync_obj);
> @@ -2442,7 +2377,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   
>   	/* Validate BOs and map them to GPUVM (update VM page tables). */
>   	list_for_each_entry(mem, &process_info->kfd_bo_list,
> -			    validate_list.head) {
> +			    validate_list) {
>   
>   		struct amdgpu_bo *bo = mem->bo;
>   		uint32_t domain = mem->domain;
> @@ -2515,8 +2450,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   	*ef = dma_fence_get(&new_fence->base);
>   
>   	/* Attach new eviction fence to all BOs */
> -	list_for_each_entry(mem, &process_info->kfd_bo_list,
> -		validate_list.head)
> +	list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list)
>   		amdgpu_bo_fence(mem->bo,
>   			&process_info->eviction_fence->base, true);
>   
> @@ -2529,11 +2463,10 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   	}
>   
>   validate_map_fail:
> -	ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
>   	amdgpu_sync_free(&sync_obj);
>   ttm_reserve_fail:
> +	drm_exec_fini(&exec);
>   	mutex_unlock(&process_info->lock);
> -	kfree(pd_bo_list);
>   	return ret;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5277c10d901d..c82c580f1df5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -33,6 +33,7 @@
>   
>   #include <drm/amdgpu_drm.h>
>   #include <drm/drm_drv.h>
> +#include <drm/drm_exec.h>
>   #include "amdgpu.h"
>   #include "amdgpu_trace.h"
>   #include "amdgpu_amdkfd.h"
> @@ -639,6 +640,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
>   	list_add(&entry->tv.head, validated);
>   }
>   
> +/**
> + * amdgpu_vm_lock_pd - lock PD in drm_exec
> + *
> + * @vm: vm providing the BOs
> + * @exec: drm execution context
> + *
> + * Lock the VM root PD in the DRM execution context.
> + */
> +int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec)
> +{
> +	return drm_exec_prepare_obj(exec, &vm->root.bo->tbo.base, 4);
> +}
> +
>   /**
>    * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
>    *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index bd7892482bbf..15d26f442e70 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -36,6 +36,8 @@
>   #include "amdgpu_ring.h"
>   #include "amdgpu_ids.h"
>   
> +struct drm_exec;
> +
>   struct amdgpu_bo_va;
>   struct amdgpu_job;
>   struct amdgpu_bo_list_entry;
> @@ -383,6 +385,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
>   void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
>   			 struct list_head *validated,
>   			 struct amdgpu_bo_list_entry *entry);
> +int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec);
>   bool amdgpu_vm_ready(struct amdgpu_vm *vm);
>   int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   			      int (*callback)(void *p, struct amdgpu_bo *bo),
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index b3fc3e958227..b5cb234e9f77 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -23,6 +23,8 @@
>   
>   #include <linux/types.h>
>   #include <linux/sched/task.h>
> +#include <drm/drm_exec.h>
> +
>   #include "amdgpu_sync.h"
>   #include "amdgpu_object.h"
>   #include "amdgpu_vm.h"
> @@ -1373,9 +1375,7 @@ struct svm_validate_context {
>   	struct svm_range *prange;
>   	bool intr;
>   	unsigned long bitmap[MAX_GPU_INSTANCE];
> -	struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
> -	struct list_head validate_list;
> -	struct ww_acquire_ctx ticket;
> +	struct drm_exec exec;
>   };
>   
>   static int svm_range_reserve_bos(struct svm_validate_context *ctx)
> @@ -1385,25 +1385,23 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
>   	uint32_t gpuidx;
>   	int r;
>   
> -	INIT_LIST_HEAD(&ctx->validate_list);
> +	drm_exec_init(&ctx->exec, true);
>   	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
>   		pdd = kfd_process_device_from_gpuidx(ctx->process, gpuidx);
>   		if (!pdd) {
>   			pr_debug("failed to find device idx %d\n", gpuidx);
> -			return -EINVAL;
> +			r = -EINVAL;
> +			goto unreserve_out;
>   		}
>   		vm = drm_priv_to_vm(pdd->drm_priv);
>   
> -		ctx->tv[gpuidx].bo = &vm->root.bo->tbo;
> -		ctx->tv[gpuidx].num_shared = 4;
> -		list_add(&ctx->tv[gpuidx].head, &ctx->validate_list);
> -	}
> -
> -	r = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->validate_list,
> -				   ctx->intr, NULL);
> -	if (r) {
> -		pr_debug("failed %d to reserve bo\n", r);
> -		return r;
> +		r = amdgpu_vm_lock_pd(vm, &ctx->exec);
> +		if (unlikely(r == -EDEADLK))
> +			continue;
> +		if (unlikely(r)) {
> +			pr_debug("failed %d to reserve bo\n", r);
> +			goto unreserve_out;
> +		}
>   	}
>   
>   	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
> @@ -1426,13 +1424,13 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
>   	return 0;
>   
>   unreserve_out:
> -	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
> +	drm_exec_fini(&ctx->exec);
>   	return r;
>   }
>   
>   static void svm_range_unreserve_bos(struct svm_validate_context *ctx)
>   {
> -	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
> +	drm_exec_fini(&ctx->exec);
>   }
>   
>   static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
  2022-05-09 14:31     ` Daniel Vetter
@ 2022-06-09  0:05   ` Felix Kuehling
  2022-06-09  7:18     ` Christian König
  1 sibling, 1 reply; 22+ messages in thread
From: Felix Kuehling @ 2022-06-09  0:05 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx, daniel; +Cc: Christian König


On 2022-05-04 03:47, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existinc TTMs execbuf util and intended to replace
> it in the long term.
>
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
>
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measureable.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

I finally got around to catching up with this thread. See some questions 
and comments inline.


> ---
>   Documentation/gpu/drm-mm.rst |  12 ++
>   drivers/gpu/drm/Kconfig      |   7 +
>   drivers/gpu/drm/Makefile     |   2 +
>   drivers/gpu/drm/drm_exec.c   | 295 +++++++++++++++++++++++++++++++++++
>   include/drm/drm_exec.h       | 144 +++++++++++++++++
>   5 files changed, 460 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index f32ccce5722d..bf7dd2a78e9b 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index e88c497fa010..1b35c10df263 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -179,6 +179,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> @@ -252,6 +258,7 @@ config DRM_AMDGPU
>   	select DRM_SCHED
>   	select DRM_TTM
>   	select DRM_TTM_HELPER
> +	select DRM_EXEC
>   	select POWER_SUPPLY
>   	select HWMON
>   	select BACKLIGHT_CLASS_DEVICE
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 15fe3163f822..ee8573b683f3 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -37,6 +37,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..ed2106c22786
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,295 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_lock(&exec, boB, 1);

Where is drm_exec_lock? It's not in this patch.


> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Initialize the drm_exec_objects container */
> +static void drm_exec_objects_init(struct drm_exec_objects *container)
> +{
> +	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);

Should this be kvmalloc? You're using kvrealloc and kvfree elsewhere.


> +
> +	/* If allocation here fails, just delay that till the first use */
> +	container->max_objects = container->objects ?
> +		PAGE_SIZE / sizeof(void *) : 0;
> +	container->num_objects = 0;
> +}
> +
> +/* Cleanup the drm_exec_objects container */
> +static void drm_exec_objects_fini(struct drm_exec_objects *container)
> +{
> +	kvfree(container->objects);
> +}
> +
> +/* Make sure we have enough room and add an object the container */
> +static int drm_exec_objects_add(struct drm_exec_objects *container,
> +				struct drm_gem_object *obj)
> +{
> +	if (unlikely(container->num_objects == container->max_objects)) {
> +		size_t size = container->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		container->objects = tmp;
> +		container->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	container->objects[container->num_objects++] = obj;
> +	return 0;
> +}
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_duplicate_object(exec, index, obj)
> +		drm_gem_object_put(obj);
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked and duplicate
> + * objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	drm_exec_objects_init(&exec->locked);
> +	drm_exec_objects_init(&exec->duplicates);
> +	exec->contended = DRM_EXEC_DUMMY;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finilize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	drm_exec_objects_fini(&exec->locked);
> +	drm_exec_objects_fini(&exec->duplicates);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->locked.num_objects = 0;
> +	exec->duplicates.num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the xa and reserve fences */
> +static int drm_exec_obj_locked(struct drm_exec_objects *container,
> +			       struct drm_gem_object *obj,
> +			       unsigned int num_fences)
> +{
> +	int ret;
> +
> +	if (container) {
> +		ret = drm_exec_objects_add(container, obj);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (num_fences) {
> +		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> +		if (ret)
> +			goto error_erase;
> +	}
> +
> +	return 0;
> +
> +error_erase:
> +	if (container) {
> +		--container->num_objects;
> +		drm_gem_object_put(obj);
> +	}
> +	return ret;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
> +	if (unlikely(ret))
> +		dma_resv_unlock(obj->resv);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * succesfully locked objects are put into the locked container. Duplicates
> + * detected as well and automatically moved into the duplicates container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);

If this succeeds, it won't reserve any fence slots for object. Is that a 
problem?


> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY)) {
> +		struct drm_exec_objects *container = &exec->duplicates;
> +
> +		/*
> +		 * If this is the first locked GEM object it was most likely
> +		 * just contended. So don't add it to the duplicates, just
> +		 * reserve the fence slots.

I don't understand this. Seems a bit arbitrary. Is it even legal to try 
to add the same object twice? I thought duplicates was for different 
objects that share the same reservation, not actually the same object on 
the same list twice.

Maybe you meant to compare with 
exec->locked.objects[exec->locked.num_objects-1], assuming that 
drm_exec_lock_contended just succeeded locking a previously contended 
object, and the caller retried locking that same object again?


> +		 */
> +		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
> +			container = NULL;
> +
> +		ret = drm_exec_obj_locked(container, obj, num_fences);
> +		if (ret)
> +			return ret;
> +
> +	} else if (unlikely(ret)) {
> +		return ret;
> +
> +	} else {
> +		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
> +		if (ret)
> +			goto error_unlock;
> +	}
> +
> +	drm_gem_object_get(obj);

The container already gets a reference to obj. What is this extra 
reference for? And where does it get dropped?

Regards,
   Felix


> +	return 0;
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..f73981c6292e
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,144 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec_objects - Container for GEM objects in a drm_exec
> + */
> +struct drm_exec_objects {
> +	unsigned int		num_objects;
> +	unsigned int		max_objects;
> +	struct drm_gem_object	**objects;
> +};
> +
> +/**
> + * drm_exec_objects_for_each - iterate over all the objects inside the container
> + */
> +#define drm_exec_objects_for_each(array, index, obj)		\
> +	for (index = 0, obj = (array)->objects[0];		\
> +	     index < (array)->num_objects;			\
> +	     ++index, obj = (array)->objects[index])
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @locked: container for the locked GEM objects
> +	 */
> +	struct drm_exec_objects	locked;
> +
> +	/**
> +	 * @duplicates: container for the duplicated GEM objects
> +	 */
> +	struct drm_exec_objects	duplicates;
> +
> +	/**
> +	 * @contended: contended GEM object we backet of for.
> +	 */
> +	struct drm_gem_object	*contended;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->locked, index, obj)
> +
> +/**
> + * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the duplicate GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
> +	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +/**
> + * drm_exec_has_duplicates - check for duplicated GEM object
> + * @exec: drm_exec object
> + *
> + * Return true if the drm_exec object has encountered some already locked GEM
> + * objects while trying to lock them. This can happen if multiple GEM objects
> + * share the same underlying resv object.
> + */
> +static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
> +{
> +	return exec->duplicates.num_objects > 0;
> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +
> +#endif

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-06-09  0:05   ` Felix Kuehling
@ 2022-06-09  7:18     ` Christian König
  2022-06-09 14:28       ` Felix Kuehling
  0 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2022-06-09  7:18 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, dri-devel, amd-gfx, daniel

Am 09.06.22 um 02:05 schrieb Felix Kuehling:
> [SNIP]
>> + *
>> + *        ret = drm_exec_lock(&exec, boB, 1);
>
> Where is drm_exec_lock? It's not in this patch.

I've renamed this function to drm_exec_prepare_obj() but forgot to 
update the documentation. Going to fix this, thanks.

>
>
>> + * drm_exec_continue_on_contention(&exec);
>> + *        if (ret)
>> + *            goto error;
>> + *    }
>> + *
>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *        ...
>> + *    }
>> + *    drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
>> +
>> +/* Dummy value used to initially enter the retry loop */
>> +#define DRM_EXEC_DUMMY (void*)~0
>> +
>> +/* Initialize the drm_exec_objects container */
>> +static void drm_exec_objects_init(struct drm_exec_objects *container)
>> +{
>> +    container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>
> Should this be kvmalloc? You're using kvrealloc and kvfree elsewhere.

The initial number of objects tracked should be small enough so that we 
can use kmalloc() directly.

> [SNIP]
>>
>> +/**
>> + * drm_exec_prepare_obj - prepare a GEM object for use
>> + * @exec: the drm_exec object with the state
>> + * @obj: the GEM object to prepare
>> + * @num_fences: how many fences to reserve
>> + *
>> + * Prepare a GEM object for use by locking it and reserving fence 
>> slots. All
>> + * succesfully locked objects are put into the locked container. 
>> Duplicates
>> + * detected as well and automatically moved into the duplicates 
>> container.
>> + *
>> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    ret = drm_exec_lock_contended(exec);
>
> If this succeeds, it won't reserve any fence slots for object. Is that 
> a problem?

No, the contended object should be re-presented later on and then we 
allocate the fence slots.

>>
>> +    if (unlikely(ret == -EALREADY)) {
>> +        struct drm_exec_objects *container = &exec->duplicates;
>> +
>> +        /*
>> +         * If this is the first locked GEM object it was most likely
>> +         * just contended. So don't add it to the duplicates, just
>> +         * reserve the fence slots.
>
> I don't understand this. Seems a bit arbitrary. Is it even legal to 
> try to add the same object twice? I thought duplicates was for 
> different objects that share the same reservation, not actually the 
> same object on the same list twice.

Yes, it's legal to try the same object twice. That can easily happen 
when userspace specifies the same handle twice for a submission.

The other possibility is that we had a contention, backed off and then 
locked the contention first and then re-tried everything else once more.

> Maybe you meant to compare with 
> exec->locked.objects[exec->locked.num_objects-1], assuming that 
> drm_exec_lock_contended just succeeded locking a previously contended 
> object, and the caller retried locking that same object again?

Yes, that's pretty much the idea. But then exec->locked.num_objects is 
just 1 so that's equal to checking exec->locked.num_objects[0].

On the other hand we would also catch cases where we lock the same 
object multiple times directly behind each other. Need to think about 
that a bit.

>> +         */
>> +        if (exec->locked.num_objects && exec->locked.objects[0] == obj)
>> +            container = NULL;
>> +
>> +        ret = drm_exec_obj_locked(container, obj, num_fences);
>> +        if (ret)
>> +            return ret;
>> +
>> +    } else if (unlikely(ret)) {
>> +        return ret;
>> +
>> +    } else {
>> +        ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
>> +        if (ret)
>> +            goto error_unlock;
>> +    }
>> +
>> +    drm_gem_object_get(obj);
>
> The container already gets a reference to obj. What is this extra 
> reference for? And where does it get dropped?

Need to double check this, might be that it's just a leftover.

Thanks,
Christian.

>
> Regards,
>   Felix


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/8] drm: execution context for GEM buffers v2
  2022-06-09  7:18     ` Christian König
@ 2022-06-09 14:28       ` Felix Kuehling
  0 siblings, 0 replies; 22+ messages in thread
From: Felix Kuehling @ 2022-06-09 14:28 UTC (permalink / raw)
  To: Christian König, Christian König, dri-devel, amd-gfx, daniel


Am 2022-06-09 um 03:18 schrieb Christian König:
> Am 09.06.22 um 02:05 schrieb Felix Kuehling:
>> [SNIP]
>>> + *
>>> + *        ret = drm_exec_lock(&exec, boB, 1);
>>
>> Where is drm_exec_lock? It's not in this patch.
>
> I've renamed this function to drm_exec_prepare_obj() but forgot to 
> update the documentation. Going to fix this, thanks.
>
>>
>>
>>> + * drm_exec_continue_on_contention(&exec);
>>> + *        if (ret)
>>> + *            goto error;
>>> + *    }
>>> + *
>>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>>> + *        ...
>>> + *    }
>>> + *    drm_exec_fini(&exec);
>>> + *
>>> + * See struct dma_exec for more details.
>>> + */
>>> +
>>> +/* Dummy value used to initially enter the retry loop */
>>> +#define DRM_EXEC_DUMMY (void*)~0
>>> +
>>> +/* Initialize the drm_exec_objects container */
>>> +static void drm_exec_objects_init(struct drm_exec_objects *container)
>>> +{
>>> +    container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>
>> Should this be kvmalloc? You're using kvrealloc and kvfree elsewhere.
>
> The initial number of objects tracked should be small enough so that 
> we can use kmalloc() directly.

Ok. It just seems weird to allocate with kmalloc and then kvrealloc the 
same pointer. But if that's safe it doesn't matter.

BTW, if the caller already knows the number of BOs it wants to add, 
would is make sense to add that as a parameter to drm_exec_objects_init 
to save you all the reallocs?


>
>> [SNIP]
>>>
>>> +/**
>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>> + * @exec: the drm_exec object with the state
>>> + * @obj: the GEM object to prepare
>>> + * @num_fences: how many fences to reserve
>>> + *
>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>> slots. All
>>> + * succesfully locked objects are put into the locked container. 
>>> Duplicates
>>> + * detected as well and automatically moved into the duplicates 
>>> container.
>>> + *
>>> + * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
>>> + * allocation failed and zero for success.
>>> + */
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>> drm_gem_object *obj,
>>> +             unsigned int num_fences)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = drm_exec_lock_contended(exec);
>>
>> If this succeeds, it won't reserve any fence slots for object. Is 
>> that a problem?
>
> No, the contended object should be re-presented later on and then we 
> allocate the fence slots.

Ok. Along with all other objects, because drm_exec_cleanup will have 
cleaned up the lists.


>>>
>>> +    if (unlikely(ret == -EALREADY)) {
>>> +        struct drm_exec_objects *container = &exec->duplicates;
>>> +
>>> +        /*
>>> +         * If this is the first locked GEM object it was most likely
>>> +         * just contended. So don't add it to the duplicates, just
>>> +         * reserve the fence slots.
>>
>> I don't understand this. Seems a bit arbitrary. Is it even legal to 
>> try to add the same object twice? I thought duplicates was for 
>> different objects that share the same reservation, not actually the 
>> same object on the same list twice.
>
> Yes, it's legal to try the same object twice. That can easily happen 
> when userspace specifies the same handle twice for a submission.
>
> The other possibility is that we had a contention, backed off and then 
> locked the contention first and then re-tried everything else once more.
>
>> Maybe you meant to compare with 
>> exec->locked.objects[exec->locked.num_objects-1], assuming that 
>> drm_exec_lock_contended just succeeded locking a previously contended 
>> object, and the caller retried locking that same object again?
>
> Yes, that's pretty much the idea. But then exec->locked.num_objects is 
> just 1 so that's equal to checking exec->locked.num_objects[0].

Ok. I missed that after drm_exec_continue_on_contention the 
drm_exec_cleanup will clean up the lists and we start over locking all 
the objects. Why can't drm_exec_cleanup just lock the contended object 
itself? I think that would make it more obvious that the contended 
object is always the first one that gets locked and added to the list.

Thanks,
   Felix


>
> On the other hand we would also catch cases where we lock the same 
> object multiple times directly behind each other. Need to think about 
> that a bit.
>
>>> +         */
>>> +        if (exec->locked.num_objects && exec->locked.objects[0] == 
>>> obj)
>>> +            container = NULL;
>>> +
>>> +        ret = drm_exec_obj_locked(container, obj, num_fences);
>>> +        if (ret)
>>> +            return ret;
>>> +
>>> +    } else if (unlikely(ret)) {
>>> +        return ret;
>>> +
>>> +    } else {
>>> +        ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
>>> +        if (ret)
>>> +            goto error_unlock;
>>> +    }
>>> +
>>> +    drm_gem_object_get(obj);
>>
>> The container already gets a reference to obj. What is this extra 
>> reference for? And where does it get dropped?
>
> Need to double check this, might be that it's just a leftover.
>
> Thanks,
> Christian.
>
>>
>> Regards,
>>   Felix
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/8] drm: execution context for GEM buffers v2
@ 2022-05-04  7:44 Christian König
  0 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2022-05-04  7:44 UTC (permalink / raw)
  To: dri-devel, amd-gfx, daniel; +Cc: Christian König

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measureable.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   7 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 295 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 144 +++++++++++++++++
 5 files changed, 460 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index f32ccce5722d..bf7dd2a78e9b 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e88c497fa010..1b35c10df263 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -179,6 +179,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
@@ -252,6 +258,7 @@ config DRM_AMDGPU
 	select DRM_SCHED
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select POWER_SUPPLY
 	select HWMON
 	select BACKLIGHT_CLASS_DEVICE
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 15fe3163f822..ee8573b683f3 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -37,6 +37,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..ed2106c22786
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,295 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_lock(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Initialize the drm_exec_objects container */
+static void drm_exec_objects_init(struct drm_exec_objects *container)
+{
+	container->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	container->max_objects = container->objects ?
+		PAGE_SIZE / sizeof(void *) : 0;
+	container->num_objects = 0;
+}
+
+/* Cleanup the drm_exec_objects container */
+static void drm_exec_objects_fini(struct drm_exec_objects *container)
+{
+	kvfree(container->objects);
+}
+
+/* Make sure we have enough room and add an object the container */
+static int drm_exec_objects_add(struct drm_exec_objects *container,
+				struct drm_gem_object *obj)
+{
+	if (unlikely(container->num_objects == container->max_objects)) {
+		size_t size = container->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(container->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		container->objects = tmp;
+		container->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	container->objects[container->num_objects++] = obj;
+	return 0;
+}
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_duplicate_object(exec, index, obj)
+		drm_gem_object_put(obj);
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked and duplicate
+ * objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	drm_exec_objects_init(&exec->locked);
+	drm_exec_objects_init(&exec->duplicates);
+	exec->contended = DRM_EXEC_DUMMY;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finilize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	drm_exec_objects_fini(&exec->locked);
+	drm_exec_objects_fini(&exec->duplicates);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->locked.num_objects = 0;
+	exec->duplicates.num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the xa and reserve fences */
+static int drm_exec_obj_locked(struct drm_exec_objects *container,
+			       struct drm_gem_object *obj,
+			       unsigned int num_fences)
+{
+	int ret;
+
+	if (container) {
+		ret = drm_exec_objects_add(container, obj);
+		if (ret)
+			return ret;
+	}
+
+	if (num_fences) {
+		ret = dma_resv_reserve_fences(obj->resv, num_fences);
+		if (ret)
+			goto error_erase;
+	}
+
+	return 0;
+
+error_erase:
+	if (container) {
+		--container->num_objects;
+		drm_gem_object_put(obj);
+	}
+	return ret;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(&exec->locked, obj, 0);
+	if (unlikely(ret))
+		dma_resv_unlock(obj->resv);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * succesfully locked objects are put into the locked container. Duplicates
+ * detected as well and automatically moved into the duplicates container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY)) {
+		struct drm_exec_objects *container = &exec->duplicates;
+
+		/*
+		 * If this is the first locked GEM object it was most likely
+		 * just contended. So don't add it to the duplicates, just
+		 * reserve the fence slots.
+		 */
+		if (exec->locked.num_objects && exec->locked.objects[0] == obj)
+			container = NULL;
+
+		ret = drm_exec_obj_locked(container, obj, num_fences);
+		if (ret)
+			return ret;
+
+	} else if (unlikely(ret)) {
+		return ret;
+
+	} else {
+		ret = drm_exec_obj_locked(&exec->locked, obj, num_fences);
+		if (ret)
+			goto error_unlock;
+	}
+
+	drm_gem_object_get(obj);
+	return 0;
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..f73981c6292e
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec_objects - Container for GEM objects in a drm_exec
+ */
+struct drm_exec_objects {
+	unsigned int		num_objects;
+	unsigned int		max_objects;
+	struct drm_gem_object	**objects;
+};
+
+/**
+ * drm_exec_objects_for_each - iterate over all the objects inside the container
+ */
+#define drm_exec_objects_for_each(array, index, obj)		\
+	for (index = 0, obj = (array)->objects[0];		\
+	     index < (array)->num_objects;			\
+	     ++index, obj = (array)->objects[index])
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @locked: container for the locked GEM objects
+	 */
+	struct drm_exec_objects	locked;
+
+	/**
+	 * @duplicates: container for the duplicated GEM objects
+	 */
+	struct drm_exec_objects	duplicates;
+
+	/**
+	 * @contended: contended GEM object we backet of for.
+	 */
+	struct drm_gem_object	*contended;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->locked, index, obj)
+
+/**
+ * drm_exec_for_each_duplicate_object - iterate over all the duplicate objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the duplicate GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_duplicate_object(exec, index, obj)	\
+	drm_exec_objects_for_each(&(exec)->duplicates, index, obj)
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+/**
+ * drm_exec_has_duplicates - check for duplicated GEM object
+ * @exec: drm_exec object
+ *
+ * Return true if the drm_exec object has encountered some already locked GEM
+ * objects while trying to lock them. This can happen if multiple GEM objects
+ * share the same underlying resv object.
+ */
+static inline bool drm_exec_has_duplicates(struct drm_exec *exec)
+{
+	return exec->duplicates.num_objects > 0;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-06-09 14:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04  7:47 [RFC] Common DRM execution context v2 Christian König
2022-05-04  7:47 ` [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König
2022-05-09 14:31   ` Daniel Vetter
2022-05-09 14:31     ` Daniel Vetter
2022-05-09 15:01     ` Christian König
2022-05-11 15:23       ` Daniel Vetter
2022-05-11 15:23         ` Daniel Vetter
2022-06-09  0:05   ` Felix Kuehling
2022-06-09  7:18     ` Christian König
2022-06-09 14:28       ` Felix Kuehling
2022-05-04  7:47 ` [PATCH 2/8] drm: add drm_exec selftests Christian König
2022-05-04  7:47 ` [PATCH 3/8] drm/amdkfd: switch over to using drm_exec Christian König
2022-06-09  0:04   ` Felix Kuehling
2022-05-04  7:47 ` [PATCH 4/8] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
2022-05-04  7:47 ` [PATCH 5/8] drm/amdgpu: use the new drm_exec object for CS Christian König
2022-05-04  7:47 ` [PATCH 6/8] drm/radeon: switch over to drm_exec Christian König
2022-05-04  7:47 ` [PATCH 7/8] drm/qxl: switch to using drm_exec Christian König
2022-05-04  7:47 ` [PATCH 8/8] drm: move ttm_execbuf_util into vmwgfx Christian König
2022-05-09 14:33   ` Daniel Vetter
2022-05-09 14:33     ` Daniel Vetter
2022-05-09 15:02     ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2022-05-04  7:44 [PATCH 1/8] drm: execution context for GEM buffers v2 Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.