All of lore.kernel.org
 help / color / mirror / Atom feed
* Common DRM execution context v4
@ 2023-05-04 11:51 Christian König
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
                   ` (12 more replies)
  0 siblings, 13 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Hi guys,

so well known patch set by now. I've tried to address all review
comments and extended the set to also replace
drm_gem_lock_reservations() as suggested by Thomas.

I won't have much time to work on this in the next few weeks, so feel
free to pick up this work and commit it when you need it.

Regards,
Christian.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-05-04 11:51 Common DRM execution context v4 Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 14:02   ` Thomas Hellström (Intel)
                     ` (2 more replies)
  2023-05-04 11:51 ` [PATCH 02/13] drm: add drm_exec selftests v2 Christian König
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unecessary and measurable.
v3: drop duplicate tracking, radeon is really the only one needing that.
v4: fixes issues pointed out by Danilo, some typos in comments and a
    helper for lock arrays of GEM objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   6 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 278 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 119 +++++++++++++++
 5 files changed, 417 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..a52e6f4117d6 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index ba3fb04bb691..2dc81eb062eb 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -201,6 +201,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index a33257d2bc7f..9c6446eb3c83 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..18071bff20f4
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,278 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ *	struct drm_gem_object *obj;
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	drm_exec_while_not_all_locked(&exec) {
+ *		ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *
+ *		ret = drm_exec_prepare_obj(&exec, boB, 1);
+ *		drm_exec_continue_on_contention(&exec);
+ *		if (ret)
+ *			goto error;
+ *	}
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+
+	drm_gem_object_put(exec->prelocked);
+	exec->prelocked = NULL;
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked objects.
+ */
+void drm_exec_init(struct drm_exec *exec, bool interruptible)
+{
+	exec->interruptible = interruptible;
+	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
+	exec->num_objects = 0;
+	exec->contended = DRM_EXEC_DUMMY;
+	exec->prelocked = NULL;
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finalize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	kvfree(exec->objects);
+	if (exec->contended != DRM_EXEC_DUMMY) {
+		drm_gem_object_put(exec->contended);
+		ww_acquire_fini(&exec->ticket);
+	}
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_cleanup - cleanup when contention is detected
+ * @exec: the drm_exec object to cleanup
+ *
+ * Cleanup the current state and return true if we should stay inside the retry
+ * loop, false if there wasn't any contention detected and we can keep the
+ * objects locked.
+ */
+bool drm_exec_cleanup(struct drm_exec *exec)
+{
+	if (likely(!exec->contended)) {
+		ww_acquire_done(&exec->ticket);
+		return false;
+	}
+
+	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
+		exec->contended = NULL;
+		ww_acquire_init(&exec->ticket, &reservation_ww_class);
+		return true;
+	}
+
+	drm_exec_unlock_all(exec);
+	exec->num_objects = 0;
+	return true;
+}
+EXPORT_SYMBOL(drm_exec_cleanup);
+
+/* Track the locked object in the array */
+static int drm_exec_obj_locked(struct drm_exec *exec,
+			       struct drm_gem_object *obj)
+{
+	if (unlikely(exec->num_objects == exec->max_objects)) {
+		size_t size = exec->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		exec->objects = tmp;
+		exec->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	exec->objects[exec->num_objects++] = obj;
+
+	return 0;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->interruptible) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (unlikely(ret)) {
+		dma_resv_unlock(obj->resv);
+		goto error_dropref;
+	}
+
+	swap(exec->prelocked, obj);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * successfully locked objects are put into the locked container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
+ * already locked, -ENOMEM when memory allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->prelocked == obj) {
+		drm_gem_object_put(exec->prelocked);
+		exec->prelocked = NULL;
+
+		return dma_resv_reserve_fences(obj->resv, num_fences);
+	}
+
+	if (exec->interruptible)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret))
+		return ret;
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (ret)
+		goto error_unlock;
+
+	/* Keep locked when reserving fences fails */
+	return dma_resv_reserve_fences(obj->resv, num_fences);
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+/**
+ * drm_exec_prepare_array - helper to prepare an array of objects
+ * @exec: the drm_exec object with the state
+ * @objects: array of GEM object to prepare
+ * @num_objects: number of GEM objects in the array
+ * @num_fences: number of fences to reserve on each GEM object
+ *
+ * Prepares all GEM objects in an array, handles contention but aports on first
+ * error otherwise. Reserves @num_fences on each GEM object after locking it.
+ *
+ * Returns: -EALREADY when object is already locked, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences)
+{
+	int ret;
+
+	drm_exec_while_not_all_locked(exec) {
+		for (unsigned int i = 0; i < num_objects; ++i) {
+			ret = drm_exec_prepare_obj(exec, objects[i],
+						   num_fences);
+			drm_exec_break_on_contention(exec);
+			if (unlikely(ret))
+				return ret;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_exec_prepare_array);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..7c7481ed088a
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @interruptible: If locks should be taken interruptible
+	 */
+	bool			interruptible;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @num_objects: number of objects locked
+	 */
+	unsigned int		num_objects;
+
+	/**
+	 * @max_objects: maximum objects in array
+	 */
+	unsigned int		max_objects;
+
+	/**
+	 * @objects: array of the locked objects
+	 */
+	struct drm_gem_object	**objects;
+
+	/**
+	 * @contended: contended GEM object we backed off for
+	 */
+	struct drm_gem_object	*contended;
+
+	/**
+	 * @prelocked: already locked GEM object due to contention
+	 */
+	struct drm_gem_object *prelocked;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	for (index = 0, obj = (exec)->objects[0];		\
+	     index < (exec)->num_objects;			\
+	     ++index, obj = (exec)->objects[index])
+
+/**
+ * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
+ * @exec: drm_exec object
+ *
+ * Core functionality of the drm_exec object. Loops until all GEM objects are
+ * prepared and no more contention exists.
+ *
+ * At the beginning of the loop it is guaranteed that no GEM object is locked.
+ */
+#define drm_exec_while_not_all_locked(exec)	\
+	while (drm_exec_cleanup(exec))
+
+/**
+ * drm_exec_continue_on_contention - continue the loop when we need to cleanup
+ * @exec: drm_exec object
+ *
+ * Control flow helper to continue when a contention was detected and we need to
+ * clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_continue_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		continue
+
+/**
+ * drm_exec_break_on_contention - break a subordinal loop on contention
+ * @exec: drm_exec object
+ *
+ * Control flow helper to break a subordinal loop when a contention was detected
+ * and we need to clean up and re-start the loop to prepare all GEM objects.
+ */
+#define drm_exec_break_on_contention(exec)		\
+	if (unlikely(drm_exec_is_contended(exec)))	\
+		break
+
+/**
+ * drm_exec_is_contended - check for contention
+ * @exec: drm_exec object
+ *
+ * Returns true if the drm_exec object has run into some contention while
+ * locking a GEM object and needs to clean up.
+ */
+static inline bool drm_exec_is_contended(struct drm_exec *exec)
+{
+	return !!exec->contended;
+}
+
+void drm_exec_init(struct drm_exec *exec, bool interruptible);
+void drm_exec_fini(struct drm_exec *exec);
+bool drm_exec_cleanup(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/13] drm: add drm_exec selftests v2
  2023-05-04 11:51 Common DRM execution context v4 Christian König
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 12:07   ` Maíra Canal
  2023-05-04 11:51 ` [PATCH 03/13] drm/amdkfd: switch over to using drm_exec v2 Christian König
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Largely just the initial skeleton.

v2: add array test as well

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/Kconfig               |  1 +
 drivers/gpu/drm/tests/Makefile        |  3 +-
 drivers/gpu/drm/tests/drm_exec_test.c | 96 +++++++++++++++++++++++++++
 3 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/tests/drm_exec_test.c

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2dc81eb062eb..068e574e234e 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -80,6 +80,7 @@ config DRM_KUNIT_TEST
 	select DRM_BUDDY
 	select DRM_EXPORT_FOR_TESTS if m
 	select DRM_KUNIT_TEST_HELPERS
+	select DRM_EXEC
 	default KUNIT_ALL_TESTS
 	help
 	  This builds unit tests for DRM. This option is not useful for
diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile
index bca726a8f483..ba7baa622675 100644
--- a/drivers/gpu/drm/tests/Makefile
+++ b/drivers/gpu/drm/tests/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
 	drm_modes_test.o \
 	drm_plane_helper_test.o \
 	drm_probe_helper_test.o \
-	drm_rect_test.o
+	drm_rect_test.o	\
+	drm_exec_test.o
 
 CFLAGS_drm_mm_test.o := $(DISABLE_STRUCTLEAK_PLUGIN)
diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
new file mode 100644
index 000000000000..26aa13e62d22
--- /dev/null
+++ b/drivers/gpu/drm/tests/drm_exec_test.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#define pr_fmt(fmt) "drm_exec: " fmt
+
+#include <kunit/test.h>
+
+#include <linux/module.h>
+#include <linux/prime_numbers.h>
+
+#include <drm/drm_exec.h>
+#include <drm/drm_device.h>
+#include <drm/drm_gem.h>
+
+#include "../lib/drm_random.h"
+
+static struct drm_device dev;
+
+static void drm_exec_sanitycheck(struct kunit *test)
+{
+	struct drm_exec exec;
+
+	drm_exec_init(&exec, true);
+	drm_exec_fini(&exec);
+	pr_info("%s - ok!\n", __func__);
+}
+
+static void drm_exec_lock1(struct kunit *test)
+{
+	struct drm_gem_object gobj = { };
+	struct drm_exec exec;
+	int ret;
+
+	drm_gem_private_object_init(&dev, &gobj, PAGE_SIZE);
+
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		ret = drm_exec_prepare_obj(&exec, &gobj, 1);
+		drm_exec_continue_on_contention(&exec);
+		if (ret) {
+			drm_exec_fini(&exec);
+			pr_err("%s - err %d!\n", __func__, ret);
+			return;
+		}
+	}
+	drm_exec_fini(&exec);
+	pr_info("%s - ok!\n", __func__);
+}
+
+static void drm_exec_lock_array(struct kunit *test)
+{
+	struct drm_gem_object gobj1 = { };
+	struct drm_gem_object gobj2 = { };
+	struct drm_gem_object *array[] = { &gobj1, &gobj2 };
+	struct drm_exec exec;
+	int ret;
+
+	drm_gem_private_object_init(&dev, &gobj1, PAGE_SIZE);
+	drm_gem_private_object_init(&dev, &gobj2, PAGE_SIZE);
+
+	drm_exec_init(&exec, true);
+	ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 0);
+	if (ret) {
+		drm_exec_fini(&exec);
+		pr_err("%s - err %d!\n", __func__, ret);
+		return;
+	}
+	drm_exec_fini(&exec);
+	pr_info("%s - ok!\n", __func__);
+}
+
+static int drm_exec_suite_init(struct kunit_suite *suite)
+{
+	kunit_info(suite, "Testing DRM exec manager\n");
+	return 0;
+}
+
+static struct kunit_case drm_exec_tests[] = {
+	KUNIT_CASE(drm_exec_sanitycheck),
+	KUNIT_CASE(drm_exec_lock1),
+	KUNIT_CASE(drm_exec_lock_array),
+	{}
+};
+
+static struct kunit_suite drm_exec_test_suite = {
+	.name = "drm_exec",
+	.suite_init = drm_exec_suite_init,
+	.test_cases = drm_exec_tests,
+};
+
+kunit_test_suite(drm_exec_test_suite);
+
+MODULE_AUTHOR("AMD");
+MODULE_LICENSE("GPL and additional rights");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/13] drm/amdkfd: switch over to using drm_exec v2
  2023-05-04 11:51 Common DRM execution context v4 Christian König
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
  2023-05-04 11:51 ` [PATCH 02/13] drm: add drm_exec selftests v2 Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 04/13] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Avoids quite a bit of logic and kmalloc overhead.

v2: fix multiple problems pointed out by Felix

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Kconfig            |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   5 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 302 +++++++-----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  14 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  46 +--
 6 files changed, 161 insertions(+), 210 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 12adca8c7819..fcad4ea30a0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -21,6 +21,7 @@ config DRM_AMDGPU
 	select INTERVAL_TREE
 	select DRM_BUDDY
 	select DRM_SUBALLOC_HELPER
+	select DRM_EXEC
 	# amdgpu depends on ACPI_VIDEO when ACPI is enabled, for select to work
 	# ACPI_VIDEO's dependencies must also be selected.
 	select INPUT if ACPI
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 01ba3589b60a..dfb41d56d236 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -25,13 +25,13 @@
 #ifndef AMDGPU_AMDKFD_H_INCLUDED
 #define AMDGPU_AMDKFD_H_INCLUDED
 
+#include <linux/list.h>
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/kthread.h>
 #include <linux/workqueue.h>
 #include <linux/mmu_notifier.h>
 #include <kgd_kfd_interface.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 #include "amdgpu_sync.h"
 #include "amdgpu_vm.h"
 
@@ -69,8 +69,7 @@ struct kgd_mem {
 	struct hmm_range *range;
 	struct list_head attachments;
 	/* protected by amdkfd_process_info.lock */
-	struct ttm_validate_buffer validate_list;
-	struct ttm_validate_buffer resv_list;
+	struct list_head validate_list;
 	uint32_t domain;
 	unsigned int mapped_to_gpu_memory;
 	uint64_t va;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 83a83ced2439..75d394bb52b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -27,6 +27,8 @@
 #include <linux/sched/task.h>
 #include <drm/ttm/ttm_tt.h>
 
+#include <drm/drm_exec.h>
+
 #include "amdgpu_object.h"
 #include "amdgpu_gem.h"
 #include "amdgpu_vm.h"
@@ -925,28 +927,20 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
 				struct amdkfd_process_info *process_info,
 				bool userptr)
 {
-	struct ttm_validate_buffer *entry = &mem->validate_list;
-	struct amdgpu_bo *bo = mem->bo;
-
-	INIT_LIST_HEAD(&entry->head);
-	entry->num_shared = 1;
-	entry->bo = &bo->tbo;
 	mutex_lock(&process_info->lock);
 	if (userptr)
-		list_add_tail(&entry->head, &process_info->userptr_valid_list);
+		list_add_tail(&mem->validate_list,
+			      &process_info->userptr_valid_list);
 	else
-		list_add_tail(&entry->head, &process_info->kfd_bo_list);
+		list_add_tail(&mem->validate_list, &process_info->kfd_bo_list);
 	mutex_unlock(&process_info->lock);
 }
 
 static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem *mem,
 		struct amdkfd_process_info *process_info)
 {
-	struct ttm_validate_buffer *bo_list_entry;
-
-	bo_list_entry = &mem->validate_list;
 	mutex_lock(&process_info->lock);
-	list_del(&bo_list_entry->head);
+	list_del(&mem->validate_list);
 	mutex_unlock(&process_info->lock);
 }
 
@@ -1033,13 +1027,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
  * object can track VM updates.
  */
 struct bo_vm_reservation_context {
-	struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
-	unsigned int n_vms;		    /* Number of VMs reserved	    */
-	struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
-	struct ww_acquire_ctx ticket;	    /* Reservation ticket	    */
-	struct list_head list, duplicates;  /* BO lists			    */
-	struct amdgpu_sync *sync;	    /* Pointer to sync object	    */
-	bool reserved;			    /* Whether BOs are reserved	    */
+	/* DRM execution context for the reservation */
+	struct drm_exec exec;
+	/* Number of VMs reserved */
+	unsigned int n_vms;
+	/* Pointer to sync object */
+	struct amdgpu_sync *sync;
 };
 
 enum bo_vm_match {
@@ -1063,35 +1056,24 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 
 	WARN_ON(!vm);
 
-	ctx->reserved = false;
 	ctx->n_vms = 1;
 	ctx->sync = &mem->sync;
-
-	INIT_LIST_HEAD(&ctx->list);
-	INIT_LIST_HEAD(&ctx->duplicates);
-
-	ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
-	if (!ctx->vm_pd)
-		return -ENOMEM;
-
-	ctx->kfd_bo.priority = 0;
-	ctx->kfd_bo.tv.bo = &bo->tbo;
-	ctx->kfd_bo.tv.num_shared = 1;
-	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
-
-	amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
-
-	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
-				     false, &ctx->duplicates);
-	if (ret) {
-		pr_err("Failed to reserve buffers in ttm.\n");
-		kfree(ctx->vm_pd);
-		ctx->vm_pd = NULL;
-		return ret;
+	drm_exec_init(&ctx->exec, true);
+	drm_exec_while_not_all_locked(&ctx->exec) {
+		ret = amdgpu_vm_lock_pd(vm, &ctx->exec);
+		if (likely(!ret))
+			ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base,
+						   0);
+		drm_exec_continue_on_contention(&ctx->exec);
+		if (unlikely(ret))
+			goto error;
 	}
-
-	ctx->reserved = true;
 	return 0;
+
+error:
+	pr_err("Failed to reserve buffers in ttm.\n");
+	drm_exec_fini(&ctx->exec);
+	return ret;
 }
 
 /**
@@ -1108,63 +1090,40 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 				struct amdgpu_vm *vm, enum bo_vm_match map_type,
 				struct bo_vm_reservation_context *ctx)
 {
-	struct amdgpu_bo *bo = mem->bo;
 	struct kfd_mem_attachment *entry;
-	unsigned int i;
+	struct amdgpu_bo *bo = mem->bo;
 	int ret;
 
-	ctx->reserved = false;
-	ctx->n_vms = 0;
-	ctx->vm_pd = NULL;
 	ctx->sync = &mem->sync;
+	drm_exec_init(&ctx->exec, true);
+	drm_exec_while_not_all_locked(&ctx->exec) {
+		ctx->n_vms = 0;
+		list_for_each_entry(entry, &mem->attachments, list) {
+			if ((vm && vm != entry->bo_va->base.vm) ||
+				(entry->is_mapped != map_type
+				&& map_type != BO_VM_ALL))
+				continue;
 
-	INIT_LIST_HEAD(&ctx->list);
-	INIT_LIST_HEAD(&ctx->duplicates);
-
-	list_for_each_entry(entry, &mem->attachments, list) {
-		if ((vm && vm != entry->bo_va->base.vm) ||
-			(entry->is_mapped != map_type
-			&& map_type != BO_VM_ALL))
-			continue;
-
-		ctx->n_vms++;
-	}
-
-	if (ctx->n_vms != 0) {
-		ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
-				     GFP_KERNEL);
-		if (!ctx->vm_pd)
-			return -ENOMEM;
-	}
-
-	ctx->kfd_bo.priority = 0;
-	ctx->kfd_bo.tv.bo = &bo->tbo;
-	ctx->kfd_bo.tv.num_shared = 1;
-	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
-
-	i = 0;
-	list_for_each_entry(entry, &mem->attachments, list) {
-		if ((vm && vm != entry->bo_va->base.vm) ||
-			(entry->is_mapped != map_type
-			&& map_type != BO_VM_ALL))
-			continue;
-
-		amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
-				&ctx->vm_pd[i]);
-		i++;
-	}
+			ret = amdgpu_vm_lock_pd(entry->bo_va->base.vm,
+						&ctx->exec);
+			drm_exec_break_on_contention(&ctx->exec);
+			if (unlikely(ret))
+				goto error;
+			++ctx->n_vms;
+		}
+		drm_exec_continue_on_contention(&ctx->exec);
 
-	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
-				     false, &ctx->duplicates);
-	if (ret) {
-		pr_err("Failed to reserve buffers in ttm.\n");
-		kfree(ctx->vm_pd);
-		ctx->vm_pd = NULL;
-		return ret;
+		ret = drm_exec_prepare_obj(&ctx->exec, &bo->tbo.base, 1);
+		drm_exec_continue_on_contention(&ctx->exec);
+		if (unlikely(ret))
+			goto error;
 	}
-
-	ctx->reserved = true;
 	return 0;
+
+error:
+	pr_err("Failed to reserve buffers in ttm.\n");
+	drm_exec_fini(&ctx->exec);
+	return ret;
 }
 
 /**
@@ -1185,15 +1144,8 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
 	if (wait)
 		ret = amdgpu_sync_wait(ctx->sync, intr);
 
-	if (ctx->reserved)
-		ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
-	kfree(ctx->vm_pd);
-
+	drm_exec_fini(&ctx->exec);
 	ctx->sync = NULL;
-
-	ctx->reserved = false;
-	ctx->vm_pd = NULL;
-
 	return ret;
 }
 
@@ -1783,7 +1735,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	bool use_release_notifier = (mem->bo->kfd_bo == mem);
 	struct kfd_mem_attachment *entry, *tmp;
 	struct bo_vm_reservation_context ctx;
-	struct ttm_validate_buffer *bo_list_entry;
 	unsigned int mapped_to_gpu_memory;
 	int ret;
 	bool is_imported = false;
@@ -1811,9 +1762,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	}
 
 	/* Make sure restore workers don't access the BO any more */
-	bo_list_entry = &mem->validate_list;
 	mutex_lock(&process_info->lock);
-	list_del(&bo_list_entry->head);
+	list_del(&mem->validate_list);
 	mutex_unlock(&process_info->lock);
 
 	/* Cleanup user pages and MMU notifiers */
@@ -2376,14 +2326,14 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
 	/* Move all invalidated BOs to the userptr_inval_list */
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_valid_list,
-				 validate_list.head)
+				 validate_list)
 		if (mem->invalid)
-			list_move_tail(&mem->validate_list.head,
+			list_move_tail(&mem->validate_list,
 				       &process_info->userptr_inval_list);
 
 	/* Go through userptr_inval_list and update any invalid user_pages */
 	list_for_each_entry(mem, &process_info->userptr_inval_list,
-			    validate_list.head) {
+			    validate_list) {
 		invalid = mem->invalid;
 		if (!invalid)
 			/* BO hasn't been invalidated since the last
@@ -2461,50 +2411,43 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
  */
 static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 {
-	struct amdgpu_bo_list_entry *pd_bo_list_entries;
-	struct list_head resv_list, duplicates;
-	struct ww_acquire_ctx ticket;
+	struct ttm_operation_ctx ctx = { false, false };
 	struct amdgpu_sync sync;
+	struct drm_exec exec;
 
 	struct amdgpu_vm *peer_vm;
 	struct kgd_mem *mem, *tmp_mem;
 	struct amdgpu_bo *bo;
-	struct ttm_operation_ctx ctx = { false, false };
-	int i, ret;
-
-	pd_bo_list_entries = kcalloc(process_info->n_vms,
-				     sizeof(struct amdgpu_bo_list_entry),
-				     GFP_KERNEL);
-	if (!pd_bo_list_entries) {
-		pr_err("%s: Failed to allocate PD BO list entries\n", __func__);
-		ret = -ENOMEM;
-		goto out_no_mem;
-	}
-
-	INIT_LIST_HEAD(&resv_list);
-	INIT_LIST_HEAD(&duplicates);
+	int ret;
 
-	/* Get all the page directory BOs that need to be reserved */
-	i = 0;
-	list_for_each_entry(peer_vm, &process_info->vm_list_head,
-			    vm_list_node)
-		amdgpu_vm_get_pd_bo(peer_vm, &resv_list,
-				    &pd_bo_list_entries[i++]);
-	/* Add the userptr_inval_list entries to resv_list */
-	list_for_each_entry(mem, &process_info->userptr_inval_list,
-			    validate_list.head) {
-		list_add_tail(&mem->resv_list.head, &resv_list);
-		mem->resv_list.bo = mem->validate_list.bo;
-		mem->resv_list.num_shared = mem->validate_list.num_shared;
-	}
+	amdgpu_sync_create(&sync);
 
+	drm_exec_init(&exec, true);
 	/* Reserve all BOs and page tables for validation */
-	ret = ttm_eu_reserve_buffers(&ticket, &resv_list, false, &duplicates);
-	WARN(!list_empty(&duplicates), "Duplicates should be empty");
-	if (ret)
-		goto out_free;
+	drm_exec_while_not_all_locked(&exec) {
+		/* Reserve all the page directories */
+		list_for_each_entry(peer_vm, &process_info->vm_list_head,
+				    vm_list_node) {
+			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto unreserve_out;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	amdgpu_sync_create(&sync);
+		/* Reserve the userptr_inval_list entries to resv_list */
+		list_for_each_entry(mem, &process_info->userptr_inval_list,
+				    validate_list) {
+			struct drm_gem_object *gobj;
+
+			gobj = &mem->bo->tbo.base;
+			ret = drm_exec_prepare_obj(&exec, gobj, 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto unreserve_out;
+		}
+		drm_exec_continue_on_contention(&exec);
+	}
 
 	ret = process_validate_vms(process_info);
 	if (ret)
@@ -2513,7 +2456,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	/* Validate BOs and update GPUVM page tables */
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_inval_list,
-				 validate_list.head) {
+				 validate_list) {
 		struct kfd_mem_attachment *attachment;
 
 		bo = mem->bo;
@@ -2555,12 +2498,9 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	ret = process_update_pds(process_info, &sync);
 
 unreserve_out:
-	ttm_eu_backoff_reservation(&ticket, &resv_list);
+	drm_exec_fini(&exec);
 	amdgpu_sync_wait(&sync, false);
 	amdgpu_sync_free(&sync);
-out_free:
-	kfree(pd_bo_list_entries);
-out_no_mem:
 
 	return ret;
 }
@@ -2576,7 +2516,7 @@ static int confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
 
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_inval_list,
-				 validate_list.head) {
+				 validate_list) {
 		bool valid = amdgpu_ttm_tt_get_user_pages_done(
 				mem->bo->tbo.ttm, mem->range);
 
@@ -2588,7 +2528,7 @@ static int confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
 		}
 		WARN(mem->invalid, "Valid BO is marked invalid");
 
-		list_move_tail(&mem->validate_list.head,
+		list_move_tail(&mem->validate_list,
 			       &process_info->userptr_valid_list);
 	}
 
@@ -2698,50 +2638,46 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
  */
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 {
-	struct amdgpu_bo_list_entry *pd_bo_list;
 	struct amdkfd_process_info *process_info = info;
 	struct amdgpu_vm *peer_vm;
 	struct kgd_mem *mem;
-	struct bo_vm_reservation_context ctx;
 	struct amdgpu_amdkfd_fence *new_fence;
-	int ret = 0, i;
 	struct list_head duplicate_save;
 	struct amdgpu_sync sync_obj;
 	unsigned long failed_size = 0;
 	unsigned long total_size = 0;
+	struct drm_exec exec;
+	int ret;
 
 	INIT_LIST_HEAD(&duplicate_save);
-	INIT_LIST_HEAD(&ctx.list);
-	INIT_LIST_HEAD(&ctx.duplicates);
-
-	pd_bo_list = kcalloc(process_info->n_vms,
-			     sizeof(struct amdgpu_bo_list_entry),
-			     GFP_KERNEL);
-	if (!pd_bo_list)
-		return -ENOMEM;
 
-	i = 0;
 	mutex_lock(&process_info->lock);
-	list_for_each_entry(peer_vm, &process_info->vm_list_head,
-			vm_list_node)
-		amdgpu_vm_get_pd_bo(peer_vm, &ctx.list, &pd_bo_list[i++]);
 
-	/* Reserve all BOs and page tables/directory. Add all BOs from
-	 * kfd_bo_list to ctx.list
-	 */
-	list_for_each_entry(mem, &process_info->kfd_bo_list,
-			    validate_list.head) {
-
-		list_add_tail(&mem->resv_list.head, &ctx.list);
-		mem->resv_list.bo = mem->validate_list.bo;
-		mem->resv_list.num_shared = mem->validate_list.num_shared;
-	}
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		list_for_each_entry(peer_vm, &process_info->vm_list_head,
+				    vm_list_node) {
+			ret = amdgpu_vm_lock_pd(peer_vm, &exec);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto ttm_reserve_fail;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
-				     false, &duplicate_save);
-	if (ret) {
-		pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
-		goto ttm_reserve_fail;
+		/* Reserve all BOs and page tables/directory. Add all BOs from
+		 * kfd_bo_list to ctx.list
+		 */
+		list_for_each_entry(mem, &process_info->kfd_bo_list,
+				    validate_list) {
+			struct drm_gem_object *gobj;
+
+			gobj = &mem->bo->tbo.base;
+			ret = drm_exec_prepare_obj(&exec, gobj, 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(ret))
+				goto ttm_reserve_fail;
+		}
+		drm_exec_continue_on_contention(&exec);
 	}
 
 	amdgpu_sync_create(&sync_obj);
@@ -2759,7 +2695,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 
 	/* Validate BOs and map them to GPUVM (update VM page tables). */
 	list_for_each_entry(mem, &process_info->kfd_bo_list,
-			    validate_list.head) {
+			    validate_list) {
 
 		struct amdgpu_bo *bo = mem->bo;
 		uint32_t domain = mem->domain;
@@ -2832,8 +2768,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	*ef = dma_fence_get(&new_fence->base);
 
 	/* Attach new eviction fence to all BOs except pinned ones */
-	list_for_each_entry(mem, &process_info->kfd_bo_list,
-		validate_list.head) {
+	list_for_each_entry(mem, &process_info->kfd_bo_list, validate_list) {
 		if (mem->bo->tbo.pin_count)
 			continue;
 
@@ -2852,11 +2787,10 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	}
 
 validate_map_fail:
-	ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
 	amdgpu_sync_free(&sync_obj);
 ttm_reserve_fail:
+	drm_exec_fini(&exec);
 	mutex_unlock(&process_info->lock);
-	kfree(pd_bo_list);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3c0310576b3b..594442d23242 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -34,6 +34,7 @@
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_drv.h>
 #include <drm/ttm/ttm_tt.h>
+#include <drm/drm_exec.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -334,6 +335,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 	list_add(&entry->tv.head, validated);
 }
 
+/**
+ * amdgpu_vm_lock_pd - lock PD in drm_exec
+ *
+ * @vm: vm providing the BOs
+ * @exec: drm execution context
+ *
+ * Lock the VM root PD in the DRM execution context.
+ */
+int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec)
+{
+	return drm_exec_prepare_obj(exec, &vm->root.bo->tbo.base, 4);
+}
+
 /**
  * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 6f085f0b4ef3..ed987f73e3a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -36,6 +36,8 @@
 #include "amdgpu_ring.h"
 #include "amdgpu_ids.h"
 
+struct drm_exec;
+
 struct amdgpu_bo_va;
 struct amdgpu_job;
 struct amdgpu_bo_list_entry;
@@ -390,6 +392,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 			 struct list_head *validated,
 			 struct amdgpu_bo_list_entry *entry);
+int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec);
 bool amdgpu_vm_ready(struct amdgpu_vm *vm);
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			      int (*callback)(void *p, struct amdgpu_bo *bo),
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 96a138a39515..9a7dd6e3f4b3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -24,6 +24,8 @@
 #include <linux/types.h>
 #include <linux/sched/task.h>
 #include <drm/ttm/ttm_tt.h>
+#include <drm/drm_exec.h>
+
 #include "amdgpu_sync.h"
 #include "amdgpu_object.h"
 #include "amdgpu_vm.h"
@@ -1423,9 +1425,7 @@ struct svm_validate_context {
 	struct svm_range *prange;
 	bool intr;
 	DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
-	struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
-	struct list_head validate_list;
-	struct ww_acquire_ctx ticket;
+	struct drm_exec exec;
 };
 
 static int svm_range_reserve_bos(struct svm_validate_context *ctx)
@@ -1435,25 +1435,25 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
 	uint32_t gpuidx;
 	int r;
 
-	INIT_LIST_HEAD(&ctx->validate_list);
-	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
-		pdd = kfd_process_device_from_gpuidx(ctx->process, gpuidx);
-		if (!pdd) {
-			pr_debug("failed to find device idx %d\n", gpuidx);
-			return -EINVAL;
-		}
-		vm = drm_priv_to_vm(pdd->drm_priv);
-
-		ctx->tv[gpuidx].bo = &vm->root.bo->tbo;
-		ctx->tv[gpuidx].num_shared = 4;
-		list_add(&ctx->tv[gpuidx].head, &ctx->validate_list);
-	}
+	drm_exec_init(&ctx->exec, true);
+	drm_exec_while_not_all_locked(&ctx->exec) {
+		for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
+			pdd = kfd_process_device_from_gpuidx(ctx->process, gpuidx);
+			if (!pdd) {
+				pr_debug("failed to find device idx %d\n", gpuidx);
+				r = -EINVAL;
+				goto unreserve_out;
+			}
+			vm = drm_priv_to_vm(pdd->drm_priv);
 
-	r = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->validate_list,
-				   ctx->intr, NULL);
-	if (r) {
-		pr_debug("failed %d to reserve bo\n", r);
-		return r;
+			r = amdgpu_vm_lock_pd(vm, &ctx->exec);
+			drm_exec_break_on_contention(&ctx->exec);
+			if (unlikely(r)) {
+				pr_debug("failed %d to reserve bo\n", r);
+				goto unreserve_out;
+			}
+		}
+		drm_exec_continue_on_contention(&ctx->exec);
 	}
 
 	for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) {
@@ -1476,13 +1476,13 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
 	return 0;
 
 unreserve_out:
-	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
+	drm_exec_fini(&ctx->exec);
 	return r;
 }
 
 static void svm_range_unreserve_bos(struct svm_validate_context *ctx)
 {
-	ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list);
+	drm_exec_fini(&ctx->exec);
 }
 
 static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/13] drm/amdgpu: use drm_exec for GEM and CSA handling
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (2 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 03/13] drm/amdkfd: switch over to using drm_exec v2 Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 05/13] drm/amdgpu: use drm_exec for MES testing Christian König
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Start using the new component here as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 42 ++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 77 +++++++++++--------------
 2 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index c6d4d41c4393..ea434c8de047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -22,6 +22,8 @@
  * * Author: Monk.liu@amd.com
  */
 
+#include <drm/drm_exec.h>
+
 #include "amdgpu.h"
 
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
@@ -65,31 +67,25 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			  struct amdgpu_bo *bo, struct amdgpu_bo_va **bo_va,
 			  uint64_t csa_addr, uint32_t size)
 {
-	struct ww_acquire_ctx ticket;
-	struct list_head list;
-	struct amdgpu_bo_list_entry pd;
-	struct ttm_validate_buffer csa_tv;
+	struct drm_exec exec;
 	int r;
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&csa_tv.head);
-	csa_tv.bo = &bo->tbo;
-	csa_tv.num_shared = 1;
-
-	list_add(&csa_tv.head, &list);
-	amdgpu_vm_get_pd_bo(vm, &list, &pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
-	if (r) {
-		DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
-		return r;
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		r = amdgpu_vm_lock_pd(vm, &exec);
+		if (likely(!r))
+			r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 0);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r)) {
+			DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
+			goto error;
+		}
 	}
 
 	*bo_va = amdgpu_vm_bo_add(adev, vm, bo);
 	if (!*bo_va) {
-		ttm_eu_backoff_reservation(&ticket, &list);
-		DRM_ERROR("failed to create bo_va for static CSA\n");
-		return -ENOMEM;
+		r = -ENOMEM;
+		goto error;
 	}
 
 	r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
@@ -99,10 +95,10 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r) {
 		DRM_ERROR("failed to do bo_map on static CSA, err=%d\n", r);
 		amdgpu_vm_bo_del(adev, *bo_va);
-		ttm_eu_backoff_reservation(&ticket, &list);
-		return r;
+		goto error;
 	}
 
-	ttm_eu_backoff_reservation(&ticket, &list);
-	return 0;
+error:
+	drm_exec_fini(&exec);
+	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 863cb668e000..c5f74f241366 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -33,6 +33,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/ttm/ttm_tt.h>
 
@@ -197,29 +198,23 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
 
-	struct amdgpu_bo_list_entry vm_pd;
-	struct list_head list, duplicates;
 	struct dma_fence *fence = NULL;
-	struct ttm_validate_buffer tv;
-	struct ww_acquire_ctx ticket;
 	struct amdgpu_bo_va *bo_va;
+	struct drm_exec exec;
 	long r;
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&duplicates);
-
-	tv.bo = &bo->tbo;
-	tv.num_shared = 2;
-	list_add(&tv.head, &list);
-
-	amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
-	if (r) {
-		dev_err(adev->dev, "leaking bo va because "
-			"we fail to reserve bo (%ld)\n", r);
-		return;
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 0);
+		if (likely(!r))
+			r = amdgpu_vm_lock_pd(vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r)) {
+			dev_err(adev->dev, "leaking bo va (%ld)\n", r);
+			goto out_unlock;
+		}
 	}
+
 	bo_va = amdgpu_vm_bo_find(vm, bo);
 	if (!bo_va || --bo_va->ref_count)
 		goto out_unlock;
@@ -229,6 +224,9 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 		goto out_unlock;
 
 	r = amdgpu_vm_clear_freed(adev, vm, &fence);
+	if (unlikely(r < 0))
+		dev_err(adev->dev, "failed to clear page "
+			"tables on GEM object close (%ld)\n", r);
 	if (r || !fence)
 		goto out_unlock;
 
@@ -236,10 +234,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj,
 	dma_fence_put(fence);
 
 out_unlock:
-	if (unlikely(r < 0))
-		dev_err(adev->dev, "failed to clear page "
-			"tables on GEM object close (%ld)\n", r);
-	ttm_eu_backoff_reservation(&ticket, &list);
+	drm_exec_fini(&exec);
 }
 
 static int amdgpu_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
@@ -673,10 +668,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	struct amdgpu_bo *abo;
 	struct amdgpu_bo_va *bo_va;
-	struct amdgpu_bo_list_entry vm_pd;
-	struct ttm_validate_buffer tv;
-	struct ww_acquire_ctx ticket;
-	struct list_head list, duplicates;
+	struct drm_exec exec;
 	uint64_t va_flags;
 	uint64_t vm_size;
 	int r = 0;
@@ -726,36 +718,37 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&duplicates);
 	if ((args->operation != AMDGPU_VA_OP_CLEAR) &&
 	    !(args->flags & AMDGPU_VM_PAGE_PRT)) {
 		gobj = drm_gem_object_lookup(filp, args->handle);
 		if (gobj == NULL)
 			return -ENOENT;
 		abo = gem_to_amdgpu_bo(gobj);
-		tv.bo = &abo->tbo;
-		if (abo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID)
-			tv.num_shared = 1;
-		else
-			tv.num_shared = 0;
-		list_add(&tv.head, &list);
 	} else {
 		gobj = NULL;
 		abo = NULL;
 	}
 
-	amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		if (gobj) {
+			r = drm_exec_prepare_obj(&exec, gobj, 0);
+			drm_exec_continue_on_contention(&exec);
+			if (unlikely(r))
+				goto error;
+		}
 
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, &duplicates);
-	if (r)
-		goto error_unref;
+		r = amdgpu_vm_lock_pd(&fpriv->vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r))
+			goto error;
+	}
 
 	if (abo) {
 		bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
 		if (!bo_va) {
 			r = -ENOENT;
-			goto error_backoff;
+			goto error;
 		}
 	} else if (args->operation != AMDGPU_VA_OP_CLEAR) {
 		bo_va = fpriv->prt_va;
@@ -792,10 +785,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
 					args->operation);
 
-error_backoff:
-	ttm_eu_backoff_reservation(&ticket, &list);
-
-error_unref:
+error:
+	drm_exec_fini(&exec);
 	drm_gem_object_put(gobj);
 	return r;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/13] drm/amdgpu: use drm_exec for MES testing
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (3 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 04/13] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2 Christian König
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Start using the new component here as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 86 +++++++++++--------------
 1 file changed, 39 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index f0f00466b59f..bfa9006600dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -22,6 +22,7 @@
  */
 
 #include <linux/firmware.h>
+#include <drm/drm_exec.h>
 
 #include "amdgpu_mes.h"
 #include "amdgpu.h"
@@ -1131,34 +1132,29 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device *adev,
 				 struct amdgpu_mes_ctx_data *ctx_data)
 {
 	struct amdgpu_bo_va *bo_va;
-	struct ww_acquire_ctx ticket;
-	struct list_head list;
-	struct amdgpu_bo_list_entry pd;
-	struct ttm_validate_buffer csa_tv;
 	struct amdgpu_sync sync;
+	struct drm_exec exec;
 	int r;
 
 	amdgpu_sync_create(&sync);
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&csa_tv.head);
 
-	csa_tv.bo = &ctx_data->meta_data_obj->tbo;
-	csa_tv.num_shared = 1;
-
-	list_add(&csa_tv.head, &list);
-	amdgpu_vm_get_pd_bo(vm, &list, &pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
-	if (r) {
-		DRM_ERROR("failed to reserve meta data BO: err=%d\n", r);
-		return r;
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		r = drm_exec_prepare_obj(&exec,
+					 &ctx_data->meta_data_obj->tbo.base,
+					 0);
+		if (likely(!r))
+			r = amdgpu_vm_lock_pd(vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+                if (unlikely(r))
+			goto error_fini_exec;
 	}
 
 	bo_va = amdgpu_vm_bo_add(adev, vm, ctx_data->meta_data_obj);
 	if (!bo_va) {
-		ttm_eu_backoff_reservation(&ticket, &list);
 		DRM_ERROR("failed to create bo_va for meta data BO\n");
-		return -ENOMEM;
+		r = -ENOMEM;
+		goto error_fini_exec;
 	}
 
 	r = amdgpu_vm_bo_map(adev, bo_va, ctx_data->meta_data_gpu_addr, 0,
@@ -1168,33 +1164,35 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device *adev,
 
 	if (r) {
 		DRM_ERROR("failed to do bo_map on meta data, err=%d\n", r);
-		goto error;
+		goto error_del_bo_va;
 	}
 
 	r = amdgpu_vm_bo_update(adev, bo_va, false);
 	if (r) {
 		DRM_ERROR("failed to do vm_bo_update on meta data\n");
-		goto error;
+		goto error_del_bo_va;
 	}
 	amdgpu_sync_fence(&sync, bo_va->last_pt_update);
 
 	r = amdgpu_vm_update_pdes(adev, vm, false);
 	if (r) {
 		DRM_ERROR("failed to update pdes on meta data\n");
-		goto error;
+		goto error_del_bo_va;
 	}
 	amdgpu_sync_fence(&sync, vm->last_update);
 
 	amdgpu_sync_wait(&sync, false);
-	ttm_eu_backoff_reservation(&ticket, &list);
+	drm_exec_fini(&exec);
 
 	amdgpu_sync_free(&sync);
 	ctx_data->meta_data_va = bo_va;
 	return 0;
 
-error:
+error_del_bo_va:
 	amdgpu_vm_bo_del(adev, bo_va);
-	ttm_eu_backoff_reservation(&ticket, &list);
+
+error_fini_exec:
+	drm_exec_fini(&exec);
 	amdgpu_sync_free(&sync);
 	return r;
 }
@@ -1205,34 +1203,28 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
 	struct amdgpu_bo_va *bo_va = ctx_data->meta_data_va;
 	struct amdgpu_bo *bo = ctx_data->meta_data_obj;
 	struct amdgpu_vm *vm = bo_va->base.vm;
-	struct amdgpu_bo_list_entry vm_pd;
-	struct list_head list, duplicates;
-	struct dma_fence *fence = NULL;
-	struct ttm_validate_buffer tv;
-	struct ww_acquire_ctx ticket;
-	long r = 0;
-
-	INIT_LIST_HEAD(&list);
-	INIT_LIST_HEAD(&duplicates);
-
-	tv.bo = &bo->tbo;
-	tv.num_shared = 2;
-	list_add(&tv.head, &list);
-
-	amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
-
-	r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
-	if (r) {
-		dev_err(adev->dev, "leaking bo va because "
-			"we fail to reserve bo (%ld)\n", r);
-		return r;
+	struct dma_fence *fence;
+	struct drm_exec exec;
+	long r;
+
+	drm_exec_init(&exec, false);
+	drm_exec_while_not_all_locked(&exec) {
+		r = drm_exec_prepare_obj(&exec,
+					 &ctx_data->meta_data_obj->tbo.base,
+					 0);
+		if (likely(!r))
+			r = amdgpu_vm_lock_pd(vm, &exec);
+		drm_exec_continue_on_contention(&exec);
+                if (unlikely(r))
+			goto out_unlock;
 	}
 
 	amdgpu_vm_bo_del(adev, bo_va);
 	if (!amdgpu_vm_ready(vm))
 		goto out_unlock;
 
-	r = dma_resv_get_singleton(bo->tbo.base.resv, DMA_RESV_USAGE_BOOKKEEP, &fence);
+	r = dma_resv_get_singleton(bo->tbo.base.resv, DMA_RESV_USAGE_BOOKKEEP,
+				   &fence);
 	if (r)
 		goto out_unlock;
 	if (fence) {
@@ -1251,7 +1243,7 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
 out_unlock:
 	if (unlikely(r < 0))
 		dev_err(adev->dev, "failed to clear page tables (%ld)\n", r);
-	ttm_eu_backoff_reservation(&ticket, &list);
+	drm_exec_fini(&exec);
 
 	return r;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (4 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 05/13] drm/amdgpu: use drm_exec for MES testing Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-06-12 13:16   ` Tatsuyuki Ishi
  2023-06-20  4:07   ` Tatsuyuki Ishi
  2023-05-04 11:51 ` [PATCH 07/13] drm/radeon: switch over to drm_exec Christian König
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Use the new component here as well and remove the old handling.

v2: drop dupplicate handling

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  71 ++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 210 +++++++++-----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
 7 files changed, 115 insertions(+), 204 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 02b827785e39..eba3e4f01ea6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -53,7 +53,6 @@
 
 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_placement.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_gem.h>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 252a876b0725..b6298e901cbd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -28,6 +28,7 @@
  *    Christian König <deathsimple@vodafone.de>
  */
 
+#include <linux/sort.h>
 #include <linux/uaccess.h>
 
 #include "amdgpu.h"
@@ -50,13 +51,20 @@ static void amdgpu_bo_list_free(struct kref *ref)
 						   refcount);
 	struct amdgpu_bo_list_entry *e;
 
-	amdgpu_bo_list_for_each_entry(e, list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	amdgpu_bo_list_for_each_entry(e, list)
+		amdgpu_bo_unref(&e->bo);
+	call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);
+}
 
-		amdgpu_bo_unref(&bo);
-	}
+static int amdgpu_bo_list_entry_cmp(const void *_a, const void *_b)
+{
+	const struct amdgpu_bo_list_entry *a = _a, *b = _b;
 
-	call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);
+	if (a->priority > b->priority)
+		return 1;
+	if (a->priority < b->priority)
+		return -1;
+	return 0;
 }
 
 int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
@@ -118,7 +126,7 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
-		entry->tv.bo = &bo->tbo;
+		entry->bo = bo;
 
 		if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
 			list->gds_obj = bo;
@@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 
 	list->first_userptr = first_userptr;
 	list->num_entries = num_entries;
+	sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
+	     amdgpu_bo_list_entry_cmp, NULL);
 
 	trace_amdgpu_cs_bo_status(list->num_entries, total_size);
 
@@ -141,16 +151,10 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
 	return 0;
 
 error_free:
-	for (i = 0; i < last_entry; ++i) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
-
-		amdgpu_bo_unref(&bo);
-	}
-	for (i = first_userptr; i < num_entries; ++i) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
-
-		amdgpu_bo_unref(&bo);
-	}
+	for (i = 0; i < last_entry; ++i)
+		amdgpu_bo_unref(&array[i].bo);
+	for (i = first_userptr; i < num_entries; ++i)
+		amdgpu_bo_unref(&array[i].bo);
 	kvfree(list);
 	return r;
 
@@ -182,41 +186,6 @@ int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,
 	return -ENOENT;
 }
 
-void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
-			     struct list_head *validated)
-{
-	/* This is based on the bucket sort with O(n) time complexity.
-	 * An item with priority "i" is added to bucket[i]. The lists are then
-	 * concatenated in descending order.
-	 */
-	struct list_head bucket[AMDGPU_BO_LIST_NUM_BUCKETS];
-	struct amdgpu_bo_list_entry *e;
-	unsigned i;
-
-	for (i = 0; i < AMDGPU_BO_LIST_NUM_BUCKETS; i++)
-		INIT_LIST_HEAD(&bucket[i]);
-
-	/* Since buffers which appear sooner in the relocation list are
-	 * likely to be used more often than buffers which appear later
-	 * in the list, the sort mustn't change the ordering of buffers
-	 * with the same priority, i.e. it must be stable.
-	 */
-	amdgpu_bo_list_for_each_entry(e, list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-		unsigned priority = e->priority;
-
-		if (!bo->parent)
-			list_add_tail(&e->tv.head, &bucket[priority]);
-
-		e->user_pages = NULL;
-		e->range = NULL;
-	}
-
-	/* Connect the sorted buckets in the output list. */
-	for (i = 0; i < AMDGPU_BO_LIST_NUM_BUCKETS; i++)
-		list_splice(&bucket[i], validated);
-}
-
 void amdgpu_bo_list_put(struct amdgpu_bo_list *list)
 {
 	kref_put(&list->refcount, amdgpu_bo_list_free);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index ededdc01ca28..26c01cb131f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -23,7 +23,6 @@
 #ifndef __AMDGPU_BO_LIST_H__
 #define __AMDGPU_BO_LIST_H__
 
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/amdgpu_drm.h>
 
 struct hmm_range;
@@ -36,7 +35,7 @@ struct amdgpu_bo_va;
 struct amdgpu_fpriv;
 
 struct amdgpu_bo_list_entry {
-	struct ttm_validate_buffer	tv;
+	struct amdgpu_bo		*bo;
 	struct amdgpu_bo_va		*bo_va;
 	uint32_t			priority;
 	struct page			**user_pages;
@@ -60,8 +59,6 @@ struct amdgpu_bo_list {
 
 int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,
 		       struct amdgpu_bo_list **result);
-void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
-			     struct list_head *validated);
 void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
 int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
 				      struct drm_amdgpu_bo_list_entry **info_param);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 08eced097bd8..9e751f5d4aa7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -65,6 +65,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
 	}
 
 	amdgpu_sync_create(&p->sync);
+	drm_exec_init(&p->exec, true);
 	return 0;
 }
 
@@ -122,7 +123,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
 				   uint32_t *offset)
 {
 	struct drm_gem_object *gobj;
-	struct amdgpu_bo *bo;
 	unsigned long size;
 	int r;
 
@@ -130,21 +130,16 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
 	if (gobj == NULL)
 		return -EINVAL;
 
-	bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
-	p->uf_entry.priority = 0;
-	p->uf_entry.tv.bo = &bo->tbo;
-	/* One for TTM and two for the CS job */
-	p->uf_entry.tv.num_shared = 3;
-
+	p->uf_bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
 	drm_gem_object_put(gobj);
 
-	size = amdgpu_bo_size(bo);
+	size = amdgpu_bo_size(p->uf_bo);
 	if (size != PAGE_SIZE || (data->offset + 8) > size) {
 		r = -EINVAL;
 		goto error_unref;
 	}
 
-	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
+	if (amdgpu_ttm_tt_get_usermm(p->uf_bo->tbo.ttm)) {
 		r = -EINVAL;
 		goto error_unref;
 	}
@@ -154,7 +149,7 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p,
 	return 0;
 
 error_unref:
-	amdgpu_bo_unref(&bo);
+	amdgpu_bo_unref(&p->uf_bo);
 	return r;
 }
 
@@ -310,7 +305,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
 		goto free_all_kdata;
 	}
 
-	if (p->uf_entry.tv.bo)
+	if (p->uf_bo)
 		p->gang_leader->uf_addr = uf_offset;
 	kvfree(chunk_array);
 
@@ -355,7 +350,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
 	ib = &job->ibs[job->num_ibs++];
 
 	/* MM engine doesn't support user fences */
-	if (p->uf_entry.tv.bo && ring->funcs->no_user_fence)
+	if (p->uf_bo && ring->funcs->no_user_fence)
 		return -EINVAL;
 
 	if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
@@ -814,55 +809,18 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 	return r;
 }
 
-static int amdgpu_cs_list_validate(struct amdgpu_cs_parser *p,
-			    struct list_head *validated)
-{
-	struct ttm_operation_ctx ctx = { true, false };
-	struct amdgpu_bo_list_entry *lobj;
-	int r;
-
-	list_for_each_entry(lobj, validated, tv.head) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(lobj->tv.bo);
-		struct mm_struct *usermm;
-
-		usermm = amdgpu_ttm_tt_get_usermm(bo->tbo.ttm);
-		if (usermm && usermm != current->mm)
-			return -EPERM;
-
-		if (amdgpu_ttm_tt_is_userptr(bo->tbo.ttm) &&
-		    lobj->user_invalidated && lobj->user_pages) {
-			amdgpu_bo_placement_from_domain(bo,
-							AMDGPU_GEM_DOMAIN_CPU);
-			r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
-			if (r)
-				return r;
-
-			amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm,
-						     lobj->user_pages);
-		}
-
-		r = amdgpu_cs_bo_validate(p, bo);
-		if (r)
-			return r;
-
-		kvfree(lobj->user_pages);
-		lobj->user_pages = NULL;
-	}
-	return 0;
-}
-
 static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				union drm_amdgpu_cs *cs)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	struct ttm_operation_ctx ctx = { true, false };
 	struct amdgpu_vm *vm = &fpriv->vm;
 	struct amdgpu_bo_list_entry *e;
-	struct list_head duplicates;
+	struct drm_gem_object *obj;
+	unsigned long index;
 	unsigned int i;
 	int r;
 
-	INIT_LIST_HEAD(&p->validated);
-
 	/* p->bo_list could already be assigned if AMDGPU_CHUNK_ID_BO_HANDLES is present */
 	if (cs->in.bo_list_handle) {
 		if (p->bo_list)
@@ -882,25 +840,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 
 	mutex_lock(&p->bo_list->bo_list_mutex);
 
-	/* One for TTM and one for the CS job */
-	amdgpu_bo_list_for_each_entry(e, p->bo_list)
-		e->tv.num_shared = 2;
-
-	amdgpu_bo_list_get_list(p->bo_list, &p->validated);
-
-	INIT_LIST_HEAD(&duplicates);
-	amdgpu_vm_get_pd_bo(&fpriv->vm, &p->validated, &p->vm_pd);
-
-	if (p->uf_entry.tv.bo && !ttm_to_amdgpu_bo(p->uf_entry.tv.bo)->parent)
-		list_add(&p->uf_entry.tv.head, &p->validated);
-
 	/* Get userptr backing pages. If pages are updated after registered
 	 * in amdgpu_gem_userptr_ioctl(), amdgpu_cs_list_validate() will do
 	 * amdgpu_ttm_backend_bind() to flush and invalidate new pages
 	 */
 	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 		bool userpage_invalidated = false;
+		struct amdgpu_bo *bo = e->bo;
 		int i;
 
 		e->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
@@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		e->user_invalidated = userpage_invalidated;
 	}
 
-	r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
-				   &duplicates);
-	if (unlikely(r != 0)) {
-		if (r != -ERESTARTSYS)
-			DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
-		goto out_free_user_pages;
+	drm_exec_while_not_all_locked(&p->exec) {
+		r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
+		drm_exec_continue_on_contention(&p->exec);
+		if (unlikely(r))
+			goto out_free_user_pages;
+
+		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);
+			drm_exec_break_on_contention(&p->exec);
+			if (unlikely(r))
+				goto out_free_user_pages;
+
+			e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
+			e->range = NULL;
+		}
+		drm_exec_continue_on_contention(&p->exec);
+
+		if (p->uf_bo) {
+			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
+						 2);
+			drm_exec_continue_on_contention(&p->exec);
+			if (unlikely(r))
+				goto out_free_user_pages;
+		}
 	}
 
-	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
+		struct mm_struct *usermm;
 
-		e->bo_va = amdgpu_vm_bo_find(vm, bo);
+		usermm = amdgpu_ttm_tt_get_usermm(e->bo->tbo.ttm);
+		if (usermm && usermm != current->mm) {
+			r = -EPERM;
+			goto out_free_user_pages;
+		}
+
+		if (amdgpu_ttm_tt_is_userptr(e->bo->tbo.ttm) &&
+		    e->user_invalidated && e->user_pages) {
+			amdgpu_bo_placement_from_domain(e->bo,
+							AMDGPU_GEM_DOMAIN_CPU);
+			r = ttm_bo_validate(&e->bo->tbo, &e->bo->placement,
+					    &ctx);
+			if (r)
+				goto out_free_user_pages;
+
+			amdgpu_ttm_tt_set_user_pages(e->bo->tbo.ttm,
+						     e->user_pages);
+		}
+
+		kvfree(e->user_pages);
+		e->user_pages = NULL;
 	}
 
 	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
@@ -951,25 +935,21 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				      amdgpu_cs_bo_validate, p);
 	if (r) {
 		DRM_ERROR("amdgpu_vm_validate_pt_bos() failed.\n");
-		goto error_validate;
+		goto out_free_user_pages;
 	}
 
-	r = amdgpu_cs_list_validate(p, &duplicates);
-	if (r)
-		goto error_validate;
-
-	r = amdgpu_cs_list_validate(p, &p->validated);
-	if (r)
-		goto error_validate;
-
-	if (p->uf_entry.tv.bo) {
-		struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo);
+	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+		r = amdgpu_cs_bo_validate(p, gem_to_amdgpu_bo(obj));
+		if (unlikely(r))
+			goto out_free_user_pages;
+	}
 
-		r = amdgpu_ttm_alloc_gart(&uf->tbo);
-		if (r)
-			goto error_validate;
+	if (p->uf_bo) {
+		r = amdgpu_ttm_alloc_gart(&p->uf_bo->tbo);
+		if (unlikely(r))
+			goto out_free_user_pages;
 
-		p->gang_leader->uf_addr += amdgpu_bo_gpu_offset(uf);
+		p->gang_leader->uf_addr += amdgpu_bo_gpu_offset(p->uf_bo);
 	}
 
 	amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
@@ -981,12 +961,9 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 					 p->bo_list->oa_obj);
 	return 0;
 
-error_validate:
-	ttm_eu_backoff_reservation(&p->ticket, &p->validated);
-
 out_free_user_pages:
 	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+		struct amdgpu_bo *bo = e->bo;
 
 		if (!e->user_pages)
 			continue;
@@ -1093,7 +1070,6 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	struct amdgpu_vm *vm = &fpriv->vm;
 	struct amdgpu_bo_list_entry *e;
 	struct amdgpu_bo_va *bo_va;
-	struct amdgpu_bo *bo;
 	unsigned int i;
 	int r;
 
@@ -1122,11 +1098,6 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	}
 
 	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		/* ignore duplicates */
-		bo = ttm_to_amdgpu_bo(e->tv.bo);
-		if (!bo)
-			continue;
-
 		bo_va = e->bo_va;
 		if (bo_va == NULL)
 			continue;
@@ -1164,7 +1135,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	if (amdgpu_vm_debug) {
 		/* Invalidate all BOs to test for userspace bugs */
 		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-			struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+			struct amdgpu_bo *bo = e->bo;
 
 			/* ignore duplicates */
 			if (!bo)
@@ -1181,8 +1152,9 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
 	struct drm_gpu_scheduler *sched;
-	struct amdgpu_bo_list_entry *e;
+	struct drm_gem_object *obj;
 	struct dma_fence *fence;
+	unsigned long index;
 	unsigned int i;
 	int r;
 
@@ -1193,8 +1165,9 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 		return r;
 	}
 
-	list_for_each_entry(e, &p->validated, tv.head) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+	drm_exec_for_each_locked_object(&p->exec, index, obj) {
+		struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+
 		struct dma_resv *resv = bo->tbo.base.resv;
 		enum amdgpu_sync_mode sync_mode;
 
@@ -1258,6 +1231,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
 	struct amdgpu_job *leader = p->gang_leader;
 	struct amdgpu_bo_list_entry *e;
+	struct drm_gem_object *gobj;
+	unsigned long index;
 	unsigned int i;
 	uint64_t seq;
 	int r;
@@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	 */
 	r = 0;
 	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-		r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, e->range);
+		r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
+							e->range);
 		e->range = NULL;
 	}
 	if (r) {
@@ -1307,20 +1281,22 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	}
 
 	p->fence = dma_fence_get(&leader->base.s_fence->finished);
-	list_for_each_entry(e, &p->validated, tv.head) {
+	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
+
+		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
 
 		/* Everybody except for the gang leader uses READ */
 		for (i = 0; i < p->gang_size; ++i) {
 			if (p->jobs[i] == leader)
 				continue;
 
-			dma_resv_add_fence(e->tv.bo->base.resv,
+			dma_resv_add_fence(gobj->resv,
 					   &p->jobs[i]->base.s_fence->finished,
 					   DMA_RESV_USAGE_READ);
 		}
 
-		/* The gang leader is remembered as writer */
-		e->tv.num_shared = 0;
+		/* The gang leader as remembered as writer */
+		dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
 	}
 
 	seq = amdgpu_ctx_add_fence(p->ctx, p->entities[p->gang_leader_idx],
@@ -1336,7 +1312,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	cs->out.handle = seq;
 	leader->uf_sequence = seq;
 
-	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
+	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->exec.ticket);
 	for (i = 0; i < p->gang_size; ++i) {
 		amdgpu_job_free_resources(p->jobs[i]);
 		trace_amdgpu_cs_ioctl(p->jobs[i]);
@@ -1345,7 +1321,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	}
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
-	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
 
 	mutex_unlock(&p->adev->notifier_lock);
 	mutex_unlock(&p->bo_list->bo_list_mutex);
@@ -1366,6 +1341,8 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser)
 	unsigned i;
 
 	amdgpu_sync_free(&parser->sync);
+	drm_exec_fini(&parser->exec);
+
 	for (i = 0; i < parser->num_post_deps; i++) {
 		drm_syncobj_put(parser->post_deps[i].syncobj);
 		kfree(parser->post_deps[i].chain);
@@ -1386,11 +1363,7 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser)
 		if (parser->jobs[i])
 			amdgpu_job_free(parser->jobs[i]);
 	}
-	if (parser->uf_entry.tv.bo) {
-		struct amdgpu_bo *uf = ttm_to_amdgpu_bo(parser->uf_entry.tv.bo);
-
-		amdgpu_bo_unref(&uf);
-	}
+	amdgpu_bo_unref(&parser->uf_bo);
 }
 
 int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
@@ -1451,7 +1424,6 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	return 0;
 
 error_backoff:
-	ttm_eu_backoff_reservation(&parser.ticket, &parser.validated);
 	mutex_unlock(&parser.bo_list->bo_list_mutex);
 
 error_fini:
@@ -1786,7 +1758,7 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
 	*map = mapping;
 
 	/* Double check that the BO is reserved by this CS */
-	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->ticket)
+	if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket)
 		return -EINVAL;
 
 	if (!((*bo)->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
index fb3e3d56d427..39c33ad100cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
@@ -24,6 +24,7 @@
 #define __AMDGPU_CS_H__
 
 #include <linux/ww_mutex.h>
+#include <drm/drm_exec.h>
 
 #include "amdgpu_job.h"
 #include "amdgpu_bo_list.h"
@@ -62,11 +63,9 @@ struct amdgpu_cs_parser {
 	struct amdgpu_job	*gang_leader;
 
 	/* buffer objects */
-	struct ww_acquire_ctx		ticket;
+	struct drm_exec			exec;
 	struct amdgpu_bo_list		*bo_list;
 	struct amdgpu_mn		*mn;
-	struct amdgpu_bo_list_entry	vm_pd;
-	struct list_head		validated;
 	struct dma_fence		*fence;
 	uint64_t			bytes_moved_threshold;
 	uint64_t			bytes_moved_vis_threshold;
@@ -74,7 +73,7 @@ struct amdgpu_cs_parser {
 	uint64_t			bytes_moved_vis;
 
 	/* user fence */
-	struct amdgpu_bo_list_entry	uf_entry;
+	struct amdgpu_bo		*uf_bo;
 
 	unsigned			num_post_deps;
 	struct amdgpu_cs_post_dep	*post_deps;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 594442d23242..1dbe75b008a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -313,28 +313,6 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 	amdgpu_vm_bo_evicted(base);
 }
 
-/**
- * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
- *
- * @vm: vm providing the BOs
- * @validated: head of validation list
- * @entry: entry to add
- *
- * Add the page directory to the list of BOs to
- * validate for command submission.
- */
-void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
-			 struct list_head *validated,
-			 struct amdgpu_bo_list_entry *entry)
-{
-	entry->priority = 0;
-	entry->tv.bo = &vm->root.bo->tbo;
-	/* Two for VM updates, one for TTM and one for the CS job */
-	entry->tv.num_shared = 4;
-	entry->user_pages = NULL;
-	list_add(&entry->tv.head, validated);
-}
-
 /**
  * amdgpu_vm_lock_pd - lock PD in drm_exec
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ed987f73e3a6..38902fbce8f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -389,9 +389,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
-void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
-			 struct list_head *validated,
-			 struct amdgpu_bo_list_entry *entry);
 int amdgpu_vm_lock_pd(struct amdgpu_vm *vm, struct drm_exec *exec);
 bool amdgpu_vm_ready(struct amdgpu_vm *vm);
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/13] drm/radeon: switch over to drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (5 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2 Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 08/13] drm/qxl: switch to using drm_exec Christian König
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/Kconfig         |  1 +
 drivers/gpu/drm/radeon/radeon.h        |  7 ++--
 drivers/gpu/drm/radeon/radeon_cs.c     | 45 +++++++++++++-------------
 drivers/gpu/drm/radeon/radeon_gem.c    | 40 +++++++++++++----------
 drivers/gpu/drm/radeon/radeon_object.c | 25 +++++++-------
 drivers/gpu/drm/radeon/radeon_object.h |  2 +-
 drivers/gpu/drm/radeon/radeon_vm.c     | 10 +++---
 7 files changed, 67 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig
index e19d77d58810..2d5fb6240cec 100644
--- a/drivers/gpu/drm/radeon/Kconfig
+++ b/drivers/gpu/drm/radeon/Kconfig
@@ -11,6 +11,7 @@ config DRM_RADEON
 	select DRM_SUBALLOC_HELPER
         select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select SND_HDA_COMPONENT if SND_HDA_CORE
 	select POWER_SUPPLY
 	select HWMON
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8afb03bbce29..37a932a5195f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -75,8 +75,8 @@
 
 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_placement.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 
+#include <drm/drm_exec.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_audio_component.h>
 #include <drm/drm_suballoc.h>
@@ -458,7 +458,8 @@ struct radeon_mman {
 
 struct radeon_bo_list {
 	struct radeon_bo		*robj;
-	struct ttm_validate_buffer	tv;
+	struct list_head		list;
+	bool				shared;
 	uint64_t			gpu_offset;
 	unsigned			preferred_domains;
 	unsigned			allowed_domains;
@@ -1031,6 +1032,7 @@ struct radeon_cs_parser {
 	struct radeon_bo_list	*vm_bos;
 	struct list_head	validated;
 	unsigned		dma_reloc_idx;
+	struct drm_exec		exec;
 	/* indices of various chunks */
 	struct radeon_cs_chunk  *chunk_ib;
 	struct radeon_cs_chunk  *chunk_relocs;
@@ -1044,7 +1046,6 @@ struct radeon_cs_parser {
 	u32			cs_flags;
 	u32			ring;
 	s32			priority;
-	struct ww_acquire_ctx	ticket;
 };
 
 static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 46a27ebf4588..5c681a44cec7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -182,11 +182,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 			}
 		}
 
-		p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
-		p->relocs[i].tv.num_shared = !r->write_domain;
-
-		radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
-				      priority);
+		p->relocs[i].shared = !r->write_domain;
+		radeon_cs_buckets_add(&buckets, &p->relocs[i].list, priority);
 	}
 
 	radeon_cs_buckets_get_list(&buckets, &p->validated);
@@ -197,7 +194,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 	if (need_mmap_lock)
 		mmap_read_lock(current->mm);
 
-	r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
+	r = radeon_bo_list_validate(p->rdev, &p->exec, &p->validated, p->ring);
 
 	if (need_mmap_lock)
 		mmap_read_unlock(current->mm);
@@ -253,12 +250,11 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 	struct radeon_bo_list *reloc;
 	int r;
 
-	list_for_each_entry(reloc, &p->validated, tv.head) {
+	list_for_each_entry(reloc, &p->validated, list) {
 		struct dma_resv *resv;
 
 		resv = reloc->robj->tbo.base.resv;
-		r = radeon_sync_resv(p->rdev, &p->ib.sync, resv,
-				     reloc->tv.num_shared);
+		r = radeon_sync_resv(p->rdev, &p->ib.sync, resv, reloc->shared);
 		if (r)
 			return r;
 	}
@@ -275,6 +271,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	s32 priority = 0;
 
 	INIT_LIST_HEAD(&p->validated);
+	drm_exec_init(&p->exec, true);
 
 	if (!cs->num_chunks) {
 		return 0;
@@ -396,8 +393,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 static int cmp_size_smaller_first(void *priv, const struct list_head *a,
 				  const struct list_head *b)
 {
-	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head);
-	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head);
+	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, list);
+	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, list);
 
 	/* Sort A before B if A is smaller. */
 	if (la->robj->tbo.base.size > lb->robj->tbo.base.size)
@@ -416,11 +413,13 @@ static int cmp_size_smaller_first(void *priv, const struct list_head *a,
  * If error is set than unvalidate buffer, otherwise just free memory
  * used by parsing context.
  **/
-static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bool backoff)
+static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 {
 	unsigned i;
 
 	if (!error) {
+		struct radeon_bo_list *reloc;
+
 		/* Sort the buffer list from the smallest to largest buffer,
 		 * which affects the order of buffers in the LRU list.
 		 * This assures that the smallest buffers are added first
@@ -432,15 +431,17 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 		 * per frame under memory pressure.
 		 */
 		list_sort(NULL, &parser->validated, cmp_size_smaller_first);
-
-		ttm_eu_fence_buffer_objects(&parser->ticket,
-					    &parser->validated,
-					    &parser->ib.fence->base);
-	} else if (backoff) {
-		ttm_eu_backoff_reservation(&parser->ticket,
-					   &parser->validated);
+		list_for_each_entry(reloc, &parser->validated, list) {
+			dma_resv_add_fence(reloc->robj->tbo.base.resv,
+					   &parser->ib.fence->base,
+					   reloc->shared ?
+					   DMA_RESV_USAGE_READ :
+					   DMA_RESV_USAGE_WRITE);
+		}
 	}
 
+	drm_exec_fini(&parser->exec);
+
 	if (parser->relocs != NULL) {
 		for (i = 0; i < parser->nrelocs; i++) {
 			struct radeon_bo *bo = parser->relocs[i].robj;
@@ -692,7 +693,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	r = radeon_cs_parser_init(&parser, data);
 	if (r) {
 		DRM_ERROR("Failed to initialize parser !\n");
-		radeon_cs_parser_fini(&parser, r, false);
+		radeon_cs_parser_fini(&parser, r);
 		up_read(&rdev->exclusive_lock);
 		r = radeon_cs_handle_lockup(rdev, r);
 		return r;
@@ -706,7 +707,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	}
 
 	if (r) {
-		radeon_cs_parser_fini(&parser, r, false);
+		radeon_cs_parser_fini(&parser, r);
 		up_read(&rdev->exclusive_lock);
 		r = radeon_cs_handle_lockup(rdev, r);
 		return r;
@@ -723,7 +724,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		goto out;
 	}
 out:
-	radeon_cs_parser_fini(&parser, r, true);
+	radeon_cs_parser_fini(&parser, r);
 	up_read(&rdev->exclusive_lock);
 	r = radeon_cs_handle_lockup(rdev, r);
 	return r;
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index bdc5af23f005..332c5439c10f 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -625,33 +625,41 @@ int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
 static void radeon_gem_va_update_vm(struct radeon_device *rdev,
 				    struct radeon_bo_va *bo_va)
 {
-	struct ttm_validate_buffer tv, *entry;
-	struct radeon_bo_list *vm_bos;
-	struct ww_acquire_ctx ticket;
+	struct radeon_bo_list *vm_bos, *entry;
 	struct list_head list;
+	struct drm_exec exec;
 	unsigned domain;
 	int r;
 
 	INIT_LIST_HEAD(&list);
 
-	tv.bo = &bo_va->bo->tbo;
-	tv.num_shared = 1;
-	list_add(&tv.head, &list);
-
 	vm_bos = radeon_vm_get_bos(rdev, bo_va->vm, &list);
 	if (!vm_bos)
 		return;
 
-	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
-	if (r)
-		goto error_free;
+	drm_exec_init(&exec, true);
+	drm_exec_while_not_all_locked(&exec) {
+		list_for_each_entry(entry, &list, list) {
+			r = drm_exec_prepare_obj(&exec, &entry->robj->tbo.base,
+						 1);
+			drm_exec_break_on_contention(&exec);
+			if (unlikely(r))
+				goto error_cleanup;
+		}
+		drm_exec_continue_on_contention(&exec);
 
-	list_for_each_entry(entry, &list, head) {
-		domain = radeon_mem_type_to_domain(entry->bo->resource->mem_type);
+		r = drm_exec_prepare_obj(&exec, &bo_va->bo->tbo.base, 1);
+		drm_exec_continue_on_contention(&exec);
+		if (unlikely(r))
+			goto error_cleanup;
+	}
+
+	list_for_each_entry(entry, &list, list) {
+		domain = radeon_mem_type_to_domain(entry->robj->tbo.resource->mem_type);
 		/* if anything is swapped out don't swap it in here,
 		   just abort and wait for the next CS */
 		if (domain == RADEON_GEM_DOMAIN_CPU)
-			goto error_unreserve;
+			goto error_cleanup;
 	}
 
 	mutex_lock(&bo_va->vm->mutex);
@@ -665,10 +673,8 @@ static void radeon_gem_va_update_vm(struct radeon_device *rdev,
 error_unlock:
 	mutex_unlock(&bo_va->vm->mutex);
 
-error_unreserve:
-	ttm_eu_backoff_reservation(&ticket, &list);
-
-error_free:
+error_cleanup:
+	drm_exec_fini(&exec);
 	kvfree(vm_bos);
 
 	if (r && r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 10c0fbd9d2b4..508a6e9a2dca 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -468,23 +468,26 @@ static u64 radeon_bo_get_threshold_for_moves(struct radeon_device *rdev)
 }
 
 int radeon_bo_list_validate(struct radeon_device *rdev,
-			    struct ww_acquire_ctx *ticket,
+			    struct drm_exec *exec,
 			    struct list_head *head, int ring)
 {
 	struct ttm_operation_ctx ctx = { true, false };
 	struct radeon_bo_list *lobj;
-	struct list_head duplicates;
-	int r;
 	u64 bytes_moved = 0, initial_bytes_moved;
 	u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
+	int r;
 
-	INIT_LIST_HEAD(&duplicates);
-	r = ttm_eu_reserve_buffers(ticket, head, true, &duplicates);
-	if (unlikely(r != 0)) {
-		return r;
+	drm_exec_while_not_all_locked(exec) {
+		list_for_each_entry(lobj, head, list) {
+			r = drm_exec_prepare_obj(exec, &lobj->robj->tbo.base,
+						 1);
+			drm_exec_break_on_contention(exec);
+			if (unlikely(r && r != -EALREADY))
+				return r;
+		}
 	}
 
-	list_for_each_entry(lobj, head, tv.head) {
+	list_for_each_entry(lobj, head, list) {
 		struct radeon_bo *bo = lobj->robj;
 		if (!bo->tbo.pin_count) {
 			u32 domain = lobj->preferred_domains;
@@ -523,7 +526,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 					domain = lobj->allowed_domains;
 					goto retry;
 				}
-				ttm_eu_backoff_reservation(ticket, head);
 				return r;
 			}
 		}
@@ -531,11 +533,6 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 		lobj->tiling_flags = bo->tiling_flags;
 	}
 
-	list_for_each_entry(lobj, &duplicates, tv.head) {
-		lobj->gpu_offset = radeon_bo_gpu_offset(lobj->robj);
-		lobj->tiling_flags = lobj->robj->tiling_flags;
-	}
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 39cc87a59a9a..d7bbb52db546 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -152,7 +152,7 @@ extern void radeon_bo_force_delete(struct radeon_device *rdev);
 extern int radeon_bo_init(struct radeon_device *rdev);
 extern void radeon_bo_fini(struct radeon_device *rdev);
 extern int radeon_bo_list_validate(struct radeon_device *rdev,
-				   struct ww_acquire_ctx *ticket,
+				   struct drm_exec *exec,
 				   struct list_head *head, int ring);
 extern int radeon_bo_set_tiling_flags(struct radeon_bo *bo,
 				u32 tiling_flags, u32 pitch);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 987cabbf1318..647c4a07d92a 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -142,10 +142,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 	list[0].robj = vm->page_directory;
 	list[0].preferred_domains = RADEON_GEM_DOMAIN_VRAM;
 	list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
-	list[0].tv.bo = &vm->page_directory->tbo;
-	list[0].tv.num_shared = 1;
+	list[0].shared = true;
 	list[0].tiling_flags = 0;
-	list_add(&list[0].tv.head, head);
+	list_add(&list[0].list, head);
 
 	for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
 		if (!vm->page_tables[i].bo)
@@ -154,10 +153,9 @@ struct radeon_bo_list *radeon_vm_get_bos(struct radeon_device *rdev,
 		list[idx].robj = vm->page_tables[i].bo;
 		list[idx].preferred_domains = RADEON_GEM_DOMAIN_VRAM;
 		list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
-		list[idx].tv.bo = &list[idx].robj->tbo;
-		list[idx].tv.num_shared = 1;
+		list[idx].shared = true;
 		list[idx].tiling_flags = 0;
-		list_add(&list[idx++].tv.head, head);
+		list_add(&list[idx++].list, head);
 	}
 
 	return list;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/13] drm/qxl: switch to using drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (6 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 07/13] drm/radeon: switch over to drm_exec Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-06-20  9:13   ` Thomas Zimmermann
  2023-05-04 11:51 ` [PATCH 09/13] drm/lima: " Christian König
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/qxl/Kconfig       |  1 +
 drivers/gpu/drm/qxl/qxl_drv.h     |  7 ++--
 drivers/gpu/drm/qxl/qxl_release.c | 67 ++++++++++++++++---------------
 3 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/qxl/Kconfig b/drivers/gpu/drm/qxl/Kconfig
index ca3f51c2a8fe..9c8e433be33e 100644
--- a/drivers/gpu/drm/qxl/Kconfig
+++ b/drivers/gpu/drm/qxl/Kconfig
@@ -5,6 +5,7 @@ config DRM_QXL
 	select DRM_KMS_HELPER
 	select DRM_TTM
 	select DRM_TTM_HELPER
+	select DRM_EXEC
 	select CRC32
 	help
 	  QXL virtual GPU for Spice virtualization desktop integration.
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index ea993d7162e8..3e732648b332 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -38,12 +38,12 @@
 
 #include <drm/drm_crtc.h>
 #include <drm/drm_encoder.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/drm_ioctl.h>
 #include <drm/drm_gem.h>
 #include <drm/qxl_drm.h>
 #include <drm/ttm/ttm_bo.h>
-#include <drm/ttm/ttm_execbuf_util.h>
 #include <drm/ttm/ttm_placement.h>
 
 #include "qxl_dev.h"
@@ -101,7 +101,8 @@ struct qxl_gem {
 };
 
 struct qxl_bo_list {
-	struct ttm_validate_buffer tv;
+	struct qxl_bo		*bo;
+	struct list_head	list;
 };
 
 struct qxl_crtc {
@@ -151,7 +152,7 @@ struct qxl_release {
 	struct qxl_bo *release_bo;
 	uint32_t release_offset;
 	uint32_t surface_release_id;
-	struct ww_acquire_ctx ticket;
+	struct drm_exec	exec;
 	struct list_head bos;
 };
 
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 368d26da0d6a..da7cd9cd58f9 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -121,13 +121,11 @@ qxl_release_free_list(struct qxl_release *release)
 {
 	while (!list_empty(&release->bos)) {
 		struct qxl_bo_list *entry;
-		struct qxl_bo *bo;
 
 		entry = container_of(release->bos.next,
-				     struct qxl_bo_list, tv.head);
-		bo = to_qxl_bo(entry->tv.bo);
-		qxl_bo_unref(&bo);
-		list_del(&entry->tv.head);
+				     struct qxl_bo_list, list);
+		qxl_bo_unref(&entry->bo);
+		list_del(&entry->list);
 		kfree(entry);
 	}
 	release->release_bo = NULL;
@@ -172,8 +170,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 {
 	struct qxl_bo_list *entry;
 
-	list_for_each_entry(entry, &release->bos, tv.head) {
-		if (entry->tv.bo == &bo->tbo)
+	list_for_each_entry(entry, &release->bos, list) {
+		if (entry->bo == bo)
 			return 0;
 	}
 
@@ -182,9 +180,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 		return -ENOMEM;
 
 	qxl_bo_ref(bo);
-	entry->tv.bo = &bo->tbo;
-	entry->tv.num_shared = 0;
-	list_add_tail(&entry->tv.head, &release->bos);
+	entry->bo = bo;
+	list_add_tail(&entry->list, &release->bos);
 	return 0;
 }
 
@@ -221,21 +218,27 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
 	if (list_is_singular(&release->bos))
 		return 0;
 
-	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
-				     !no_intr, NULL);
-	if (ret)
-		return ret;
-
-	list_for_each_entry(entry, &release->bos, tv.head) {
-		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
-
-		ret = qxl_release_validate_bo(bo);
-		if (ret) {
-			ttm_eu_backoff_reservation(&release->ticket, &release->bos);
-			return ret;
+	drm_exec_init(&release->exec, !no_intr);
+	drm_exec_while_not_all_locked(&release->exec) {
+		list_for_each_entry(entry, &release->bos, list) {
+			ret = drm_exec_prepare_obj(&release->exec,
+						   &entry->bo->tbo.base,
+						   1);
+			drm_exec_break_on_contention(&release->exec);
+			if (ret)
+				goto error;
 		}
 	}
+
+	list_for_each_entry(entry, &release->bos, list) {
+		ret = qxl_release_validate_bo(entry->bo);
+		if (ret)
+			goto error;
+	}
 	return 0;
+error:
+	drm_exec_fini(&release->exec);
+	return ret;
 }
 
 void qxl_release_backoff_reserve_list(struct qxl_release *release)
@@ -245,7 +248,7 @@ void qxl_release_backoff_reserve_list(struct qxl_release *release)
 	if (list_is_singular(&release->bos))
 		return;
 
-	ttm_eu_backoff_reservation(&release->ticket, &release->bos);
+	drm_exec_fini(&release->exec);
 }
 
 int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,
@@ -404,18 +407,18 @@ void qxl_release_unmap(struct qxl_device *qdev,
 
 void qxl_release_fence_buffer_objects(struct qxl_release *release)
 {
-	struct ttm_buffer_object *bo;
 	struct ttm_device *bdev;
-	struct ttm_validate_buffer *entry;
+	struct qxl_bo_list *entry;
 	struct qxl_device *qdev;
+	struct qxl_bo *bo;
 
 	/* if only one object on the release its the release itself
 	   since these objects are pinned no need to reserve */
 	if (list_is_singular(&release->bos) || list_empty(&release->bos))
 		return;
 
-	bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo;
-	bdev = bo->bdev;
+	bo = list_first_entry(&release->bos, struct qxl_bo_list, list)->bo;
+	bdev = bo->tbo.bdev;
 	qdev = container_of(bdev, struct qxl_device, mman.bdev);
 
 	/*
@@ -426,14 +429,12 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 		       release->id | 0xf0000000, release->base.seqno);
 	trace_dma_fence_emit(&release->base);
 
-	list_for_each_entry(entry, &release->bos, head) {
+	list_for_each_entry(entry, &release->bos, list) {
 		bo = entry->bo;
 
-		dma_resv_add_fence(bo->base.resv, &release->base,
+		dma_resv_add_fence(bo->tbo.base.resv, &release->base,
 				   DMA_RESV_USAGE_READ);
-		ttm_bo_move_to_lru_tail_unlocked(bo);
-		dma_resv_unlock(bo->base.resv);
+		ttm_bo_move_to_lru_tail_unlocked(&bo->tbo);
 	}
-	ww_acquire_fini(&release->ticket);
+	drm_exec_fini(&release->exec);
 }
-
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/13] drm/lima: switch to using drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (7 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 08/13] drm/qxl: switch to using drm_exec Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 10/13] drm/virtgpu: " Christian König
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/lima/Kconfig    |  1 +
 drivers/gpu/drm/lima/lima_gem.c | 15 +++++++--------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/lima/Kconfig b/drivers/gpu/drm/lima/Kconfig
index fa1d4f5df31e..1d2871d9ddd2 100644
--- a/drivers/gpu/drm/lima/Kconfig
+++ b/drivers/gpu/drm/lima/Kconfig
@@ -9,6 +9,7 @@ config DRM_LIMA
        depends on COMMON_CLK
        depends on OF
        select DRM_SCHED
+       select DRM_EXEC
        select DRM_GEM_SHMEM_HELPER
        select PM_DEVFREQ
        select DEVFREQ_GOV_SIMPLE_ONDEMAND
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 10252dc11a22..f48c1edff07d 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -8,6 +8,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/dma-mapping.h>
 
+#include <drm/drm_exec.h>
 #include <drm/drm_file.h>
 #include <drm/drm_syncobj.h>
 #include <drm/drm_utils.h>
@@ -292,7 +293,7 @@ static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit)
 int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 {
 	int i, err = 0;
-	struct ww_acquire_ctx ctx;
+	struct drm_exec exec;
 	struct lima_drm_priv *priv = to_lima_drm_priv(file);
 	struct lima_vm *vm = priv->vm;
 	struct drm_syncobj *out_sync = NULL;
@@ -329,8 +330,9 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 		bos[i] = bo;
 	}
 
-	err = drm_gem_lock_reservations((struct drm_gem_object **)bos,
-					submit->nr_bos, &ctx);
+	drm_exec_init(&exec, true);
+	err = drm_exec_prepare_array(&exec, (struct drm_gem_object **)bos,
+				     submit->nr_bos, 0);
 	if (err)
 		goto err_out0;
 
@@ -360,9 +362,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 				   submit->bos[i].flags & LIMA_SUBMIT_BO_WRITE ?
 				   DMA_RESV_USAGE_WRITE : DMA_RESV_USAGE_READ);
 	}
-
-	drm_gem_unlock_reservations((struct drm_gem_object **)bos,
-				    submit->nr_bos, &ctx);
+	drm_exec_fini(&exec);
 
 	for (i = 0; i < submit->nr_bos; i++)
 		drm_gem_object_put(&bos[i]->base.base);
@@ -379,8 +379,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 err_out2:
 	lima_sched_task_fini(submit->task);
 err_out1:
-	drm_gem_unlock_reservations((struct drm_gem_object **)bos,
-				    submit->nr_bos, &ctx);
+	drm_exec_fini(&exec);
 err_out0:
 	for (i = 0; i < submit->nr_bos; i++) {
 		if (!bos[i])
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/13] drm/virtgpu: switch to using drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (8 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 09/13] drm/lima: " Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 11/13] drm/panfrost: " Christian König
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/virtio/Kconfig       |  1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h |  3 ++-
 drivers/gpu/drm/virtio/virtgpu_gem.c | 29 +++-------------------------
 3 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/virtio/Kconfig b/drivers/gpu/drm/virtio/Kconfig
index ea06ff2aa4b4..a24a1ce5e666 100644
--- a/drivers/gpu/drm/virtio/Kconfig
+++ b/drivers/gpu/drm/virtio/Kconfig
@@ -5,6 +5,7 @@ config DRM_VIRTIO_GPU
 	select VIRTIO
 	select DRM_KMS_HELPER
 	select DRM_GEM_SHMEM_HELPER
+	select DRM_EXEC
 	select VIRTIO_DMA_SHARED_BUFFER
 	help
 	   This is the virtual GPU driver for virtio.  It can be used with
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index af6ffb696086..c12434222e51 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -35,6 +35,7 @@
 #include <drm/drm_atomic.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_encoder.h>
+#include <drm/drm_exec.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_framebuffer.h>
 #include <drm/drm_gem.h>
@@ -116,7 +117,7 @@ struct virtio_gpu_object_vram {
 	container_of((virtio_gpu_object), struct virtio_gpu_object_vram, base)
 
 struct virtio_gpu_object_array {
-	struct ww_acquire_ctx ticket;
+	struct drm_exec exec;
 	struct list_head next;
 	u32 nents, total;
 	struct drm_gem_object *objs[];
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index 7db48d17ee3a..bcab407074f4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -171,6 +171,7 @@ struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents)
 
 	objs->nents = 0;
 	objs->total = nents;
+	drm_exec_init(&objs->exec, true);
 	return objs;
 }
 
@@ -214,36 +215,12 @@ void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
 
 int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs)
 {
-	unsigned int i;
-	int ret;
-
-	if (objs->nents == 1) {
-		ret = dma_resv_lock_interruptible(objs->objs[0]->resv, NULL);
-	} else {
-		ret = drm_gem_lock_reservations(objs->objs, objs->nents,
-						&objs->ticket);
-	}
-	if (ret)
-		return ret;
-
-	for (i = 0; i < objs->nents; ++i) {
-		ret = dma_resv_reserve_fences(objs->objs[i]->resv, 1);
-		if (ret) {
-			virtio_gpu_array_unlock_resv(objs);
-			return ret;
-		}
-	}
-	return ret;
+	return drm_exec_prepare_array(&objs->exec, objs->objs, objs->nents, 1);
 }
 
 void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs)
 {
-	if (objs->nents == 1) {
-		dma_resv_unlock(objs->objs[0]->resv);
-	} else {
-		drm_gem_unlock_reservations(objs->objs, objs->nents,
-					    &objs->ticket);
-	}
+	drm_exec_fini(&objs->exec);
 }
 
 void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/13] drm/panfrost: switch to using drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (9 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 10/13] drm/virtgpu: " Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 12/13] drm/v3d: " Christian König
  2023-05-04 11:51 ` [PATCH 13/13] drm: remove drm_gem_(un)lock_reservations Christian König
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/panfrost/Kconfig        |  1 +
 drivers/gpu/drm/panfrost/panfrost_job.c | 11 ++++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
index e6403a9d66ad..e86a1a2fd8e1 100644
--- a/drivers/gpu/drm/panfrost/Kconfig
+++ b/drivers/gpu/drm/panfrost/Kconfig
@@ -7,6 +7,7 @@ config DRM_PANFROST
 	depends on !GENERIC_ATOMIC64    # for IOMMU_IO_PGTABLE_LPAE
 	depends on MMU
 	select DRM_SCHED
+	select DRM_EXEC
 	select IOMMU_SUPPORT
 	select IOMMU_IO_PGTABLE_LPAE
 	select DRM_GEM_SHMEM_HELPER
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index dbc597ab46fb..7086a6044355 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -8,6 +8,7 @@
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/dma-resv.h>
+#include <drm/drm_exec.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/panfrost_drm.h>
 
@@ -275,13 +276,13 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos,
 int panfrost_job_push(struct panfrost_job *job)
 {
 	struct panfrost_device *pfdev = job->pfdev;
-	struct ww_acquire_ctx acquire_ctx;
+	struct drm_exec exec;
 	int ret = 0;
 
-	ret = drm_gem_lock_reservations(job->bos, job->bo_count,
-					    &acquire_ctx);
+	drm_exec_init(&exec, true);
+	ret = drm_exec_prepare_array(&exec, job->bos, job->bo_count, 1);
 	if (ret)
-		return ret;
+		goto unlock;
 
 	mutex_lock(&pfdev->sched_lock);
 	drm_sched_job_arm(&job->base);
@@ -305,7 +306,7 @@ int panfrost_job_push(struct panfrost_job *job)
 				      job->render_done_fence);
 
 unlock:
-	drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
+	drm_exec_fini(&exec);
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/13] drm/v3d: switch to using drm_exec
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (10 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 11/13] drm/panfrost: " Christian König
@ 2023-05-04 11:51 ` Christian König
  2023-05-04 11:51 ` [PATCH 13/13] drm: remove drm_gem_(un)lock_reservations Christian König
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Just a straightforward conversion without any optimization.

Only compile tested for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 43 ++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 2e94ce788c71..75880ffc0cf1 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -10,6 +10,7 @@
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 
+#include <drm/drm_exec.h>
 #include <drm/drm_managed.h>
 #include <drm/drm_syncobj.h>
 #include <uapi/drm/v3d_drm.h>
@@ -249,20 +250,16 @@ v3d_invalidate_caches(struct v3d_dev *v3d)
  * to v3d, so we don't attach dma-buf fences to them.
  */
 static int
-v3d_lock_bo_reservations(struct v3d_job *job,
-			 struct ww_acquire_ctx *acquire_ctx)
+v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
 {
 	int i, ret;
 
-	ret = drm_gem_lock_reservations(job->bo, job->bo_count, acquire_ctx);
+	drm_exec_init(exec, true);
+	ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
 	if (ret)
-		return ret;
+		goto fail;
 
 	for (i = 0; i < job->bo_count; i++) {
-		ret = dma_resv_reserve_fences(job->bo[i]->resv, 1);
-		if (ret)
-			goto fail;
-
 		ret = drm_sched_job_add_implicit_dependencies(&job->base,
 							      job->bo[i], true);
 		if (ret)
@@ -272,7 +269,7 @@ v3d_lock_bo_reservations(struct v3d_job *job,
 	return 0;
 
 fail:
-	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+	drm_exec_fini(exec);
 	return ret;
 }
 
@@ -477,7 +474,7 @@ v3d_push_job(struct v3d_job *job)
 static void
 v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 					 struct v3d_job *job,
-					 struct ww_acquire_ctx *acquire_ctx,
+					 struct drm_exec *exec,
 					 u32 out_sync,
 					 struct v3d_submit_ext *se,
 					 struct dma_fence *done_fence)
@@ -492,7 +489,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 				   DMA_RESV_USAGE_WRITE);
 	}
 
-	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+	drm_exec_fini(exec);
 
 	/* Update the return sync object for the job */
 	/* If it only supports a single signal semaphore*/
@@ -669,7 +666,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	struct v3d_render_job *render = NULL;
 	struct v3d_job *clean_job = NULL;
 	struct v3d_job *last_job;
-	struct ww_acquire_ctx acquire_ctx;
+	struct drm_exec exec;
 	int ret = 0;
 
 	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
@@ -731,7 +728,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto fail;
 
-	ret = v3d_lock_bo_reservations(last_job, &acquire_ctx);
+	ret = v3d_lock_bo_reservations(last_job, &exec);
 	if (ret)
 		goto fail;
 
@@ -775,7 +772,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 	v3d_attach_fences_and_unlock_reservation(file_priv,
 						 last_job,
-						 &acquire_ctx,
+						 &exec,
 						 args->out_sync,
 						 &se,
 						 last_job->done_fence);
@@ -791,8 +788,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 fail_unreserve:
 	mutex_unlock(&v3d->sched_lock);
 fail_perfmon:
-	drm_gem_unlock_reservations(last_job->bo,
-				    last_job->bo_count, &acquire_ctx);
+	drm_exec_fini(&exec);
 fail:
 	v3d_job_cleanup((void *)bin);
 	v3d_job_cleanup((void *)render);
@@ -819,7 +815,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	struct drm_v3d_submit_tfu *args = data;
 	struct v3d_submit_ext se = {0};
 	struct v3d_tfu_job *job = NULL;
-	struct ww_acquire_ctx acquire_ctx;
+	struct drm_exec exec;
 	int ret = 0;
 
 	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
@@ -870,7 +866,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 		job->base.bo[job->base.bo_count] = bo;
 	}
 
-	ret = v3d_lock_bo_reservations(&job->base, &acquire_ctx);
+	ret = v3d_lock_bo_reservations(&job->base, &exec);
 	if (ret)
 		goto fail;
 
@@ -879,7 +875,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	mutex_unlock(&v3d->sched_lock);
 
 	v3d_attach_fences_and_unlock_reservation(file_priv,
-						 &job->base, &acquire_ctx,
+						 &job->base, &exec,
 						 args->out_sync,
 						 &se,
 						 job->base.done_fence);
@@ -914,7 +910,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	struct v3d_submit_ext se = {0};
 	struct v3d_csd_job *job = NULL;
 	struct v3d_job *clean_job = NULL;
-	struct ww_acquire_ctx acquire_ctx;
+	struct drm_exec exec;
 	int ret;
 
 	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
@@ -957,7 +953,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto fail;
 
-	ret = v3d_lock_bo_reservations(clean_job, &acquire_ctx);
+	ret = v3d_lock_bo_reservations(clean_job, &exec);
 	if (ret)
 		goto fail;
 
@@ -983,7 +979,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 
 	v3d_attach_fences_and_unlock_reservation(file_priv,
 						 clean_job,
-						 &acquire_ctx,
+						 &exec,
 						 args->out_sync,
 						 &se,
 						 clean_job->done_fence);
@@ -996,8 +992,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 fail_unreserve:
 	mutex_unlock(&v3d->sched_lock);
 fail_perfmon:
-	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
-				    &acquire_ctx);
+	drm_exec_fini(&exec);
 fail:
 	v3d_job_cleanup((void *)job);
 	v3d_job_cleanup(clean_job);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/13] drm: remove drm_gem_(un)lock_reservations
  2023-05-04 11:51 Common DRM execution context v4 Christian König
                   ` (11 preceding siblings ...)
  2023-05-04 11:51 ` [PATCH 12/13] drm/v3d: " Christian König
@ 2023-05-04 11:51 ` Christian König
  12 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 11:51 UTC (permalink / raw)
  To: francois.dugast, felix.kuehling, arunpravin.paneerselvam,
	thomas_os, dakr, luben.tuikov, amd-gfx, dri-devel

Not used any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_gem.c              | 78 --------------------------
 drivers/gpu/drm/scheduler/sched_main.c |  5 +-
 include/drm/drm_gem.h                  |  4 --
 3 files changed, 2 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..6666cd411002 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1214,84 +1214,6 @@ void drm_gem_vunmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
 }
 EXPORT_SYMBOL(drm_gem_vunmap_unlocked);
 
-/**
- * drm_gem_lock_reservations - Sets up the ww context and acquires
- * the lock on an array of GEM objects.
- *
- * Once you've locked your reservations, you'll want to set up space
- * for your shared fences (if applicable), submit your job, then
- * drm_gem_unlock_reservations().
- *
- * @objs: drm_gem_objects to lock
- * @count: Number of objects in @objs
- * @acquire_ctx: struct ww_acquire_ctx that will be initialized as
- * part of tracking this set of locked reservations.
- */
-int
-drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
-			  struct ww_acquire_ctx *acquire_ctx)
-{
-	int contended = -1;
-	int i, ret;
-
-	ww_acquire_init(acquire_ctx, &reservation_ww_class);
-
-retry:
-	if (contended != -1) {
-		struct drm_gem_object *obj = objs[contended];
-
-		ret = dma_resv_lock_slow_interruptible(obj->resv,
-								 acquire_ctx);
-		if (ret) {
-			ww_acquire_fini(acquire_ctx);
-			return ret;
-		}
-	}
-
-	for (i = 0; i < count; i++) {
-		if (i == contended)
-			continue;
-
-		ret = dma_resv_lock_interruptible(objs[i]->resv,
-							    acquire_ctx);
-		if (ret) {
-			int j;
-
-			for (j = 0; j < i; j++)
-				dma_resv_unlock(objs[j]->resv);
-
-			if (contended != -1 && contended >= i)
-				dma_resv_unlock(objs[contended]->resv);
-
-			if (ret == -EDEADLK) {
-				contended = i;
-				goto retry;
-			}
-
-			ww_acquire_fini(acquire_ctx);
-			return ret;
-		}
-	}
-
-	ww_acquire_done(acquire_ctx);
-
-	return 0;
-}
-EXPORT_SYMBOL(drm_gem_lock_reservations);
-
-void
-drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
-			    struct ww_acquire_ctx *acquire_ctx)
-{
-	int i;
-
-	for (i = 0; i < count; i++)
-		dma_resv_unlock(objs[i]->resv);
-
-	ww_acquire_fini(acquire_ctx);
-}
-EXPORT_SYMBOL(drm_gem_unlock_reservations);
-
 /**
  * drm_gem_lru_init - initialize a LRU
  *
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index b09cdacfd062..2d8249148926 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -794,9 +794,8 @@ EXPORT_SYMBOL(drm_sched_job_add_resv_dependencies);
  * @write: whether the job might write the object (so we need to depend on
  * shared fences in the reservation object).
  *
- * This should be called after drm_gem_lock_reservations() on your array of
- * GEM objects used in the job but before updating the reservations with your
- * own fences.
+ * This should be called after locking your GEM objects used in the job but
+ * before updating the reservations with your own fences.
  *
  * Returns:
  * 0 on success, or an error on failing to expand the array.
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index b8efd836edef..7e027688a83d 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -476,10 +476,6 @@ int drm_gem_objects_lookup(struct drm_file *filp, void __user *bo_handles,
 struct drm_gem_object *drm_gem_object_lookup(struct drm_file *filp, u32 handle);
 long drm_gem_dma_resv_wait(struct drm_file *filep, u32 handle,
 				    bool wait_all, unsigned long timeout);
-int drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
-			      struct ww_acquire_ctx *acquire_ctx);
-void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
-				 struct ww_acquire_ctx *acquire_ctx);
 int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 			    u32 handle, u64 *offset);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] drm: add drm_exec selftests v2
  2023-05-04 11:51 ` [PATCH 02/13] drm: add drm_exec selftests v2 Christian König
@ 2023-05-04 12:07   ` Maíra Canal
  2023-05-04 12:52     ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Maíra Canal @ 2023-05-04 12:07 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel

Hi Christian,

It would be nice if you use the KUnit macros, instead of pr_info.

On 5/4/23 08:51, Christian König wrote:
> Largely just the initial skeleton.
> 
> v2: add array test as well
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/Kconfig               |  1 +
>   drivers/gpu/drm/tests/Makefile        |  3 +-
>   drivers/gpu/drm/tests/drm_exec_test.c | 96 +++++++++++++++++++++++++++
>   3 files changed, 99 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/gpu/drm/tests/drm_exec_test.c
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 2dc81eb062eb..068e574e234e 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -80,6 +80,7 @@ config DRM_KUNIT_TEST
>   	select DRM_BUDDY
>   	select DRM_EXPORT_FOR_TESTS if m
>   	select DRM_KUNIT_TEST_HELPERS
> +	select DRM_EXEC
>   	default KUNIT_ALL_TESTS
>   	help
>   	  This builds unit tests for DRM. This option is not useful for
> diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile
> index bca726a8f483..ba7baa622675 100644
> --- a/drivers/gpu/drm/tests/Makefile
> +++ b/drivers/gpu/drm/tests/Makefile
> @@ -17,6 +17,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
>   	drm_modes_test.o \
>   	drm_plane_helper_test.o \
>   	drm_probe_helper_test.o \
> -	drm_rect_test.o
> +	drm_rect_test.o	\
> +	drm_exec_test.o
>   
>   CFLAGS_drm_mm_test.o := $(DISABLE_STRUCTLEAK_PLUGIN)
> diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
> new file mode 100644
> index 000000000000..26aa13e62d22
> --- /dev/null
> +++ b/drivers/gpu/drm/tests/drm_exec_test.c
> @@ -0,0 +1,96 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#define pr_fmt(fmt) "drm_exec: " fmt
> +
> +#include <kunit/test.h>
> +
> +#include <linux/module.h>
> +#include <linux/prime_numbers.h>
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_device.h>
> +#include <drm/drm_gem.h>
> +
> +#include "../lib/drm_random.h"
> +
> +static struct drm_device dev;
> +
> +static void drm_exec_sanitycheck(struct kunit *test)
> +{
> +	struct drm_exec exec;
> +
> +	drm_exec_init(&exec, true);
> +	drm_exec_fini(&exec);
> +	pr_info("%s - ok!\n", __func__);

Here you could use KUNIT_SUCCEED(test).

> +}
> +
> +static void drm_exec_lock1(struct kunit *test)

Is there a reason to call the function drm_exec_lock1 instead of
just drm_exec_lock?

> +{
> +	struct drm_gem_object gobj = { };
> +	struct drm_exec exec;
> +	int ret;
> +
> +	drm_gem_private_object_init(&dev, &gobj, PAGE_SIZE);
> +
> +	drm_exec_init(&exec, true);
> +	drm_exec_while_not_all_locked(&exec) {
> +		ret = drm_exec_prepare_obj(&exec, &gobj, 1);
> +		drm_exec_continue_on_contention(&exec);
> +		if (ret) {
> +			drm_exec_fini(&exec);
> +			pr_err("%s - err %d!\n", __func__, ret);

Here you could use KUNIT_FAIL. Same for the other function.

Actually, it would be better if you created a function `exit`
associated with the test suite, where you would call drm_exec_fini,
and checked the ret variable with KUNIT_EXPECT_EQ(test, ret, 0) in
the test.

> +			return;
> +		}
> +	}
> +	drm_exec_fini(&exec);
> +	pr_info("%s - ok!\n", __func__);
> +}
> +
> +static void drm_exec_lock_array(struct kunit *test)
> +{
> +	struct drm_gem_object gobj1 = { };
> +	struct drm_gem_object gobj2 = { };
> +	struct drm_gem_object *array[] = { &gobj1, &gobj2 };
> +	struct drm_exec exec;
> +	int ret;
> +
> +	drm_gem_private_object_init(&dev, &gobj1, PAGE_SIZE);
> +	drm_gem_private_object_init(&dev, &gobj2, PAGE_SIZE);
> +
> +	drm_exec_init(&exec, true);
> +	ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 0);
> +	if (ret) {
> +		drm_exec_fini(&exec);
> +		pr_err("%s - err %d!\n", __func__, ret);
> +		return;
> +	}
> +	drm_exec_fini(&exec)> +	pr_info("%s - ok!\n", __func__);
> +}
> +
> +static int drm_exec_suite_init(struct kunit_suite *suite)
> +{
> +	kunit_info(suite, "Testing DRM exec manager\n");

Isn't this already clear by the name of the test?

Best Regards,
- Maíra Canal

> +	return 0;
> +}
> +
> +static struct kunit_case drm_exec_tests[] = {
> +	KUNIT_CASE(drm_exec_sanitycheck),
> +	KUNIT_CASE(drm_exec_lock1),
> +	KUNIT_CASE(drm_exec_lock_array),
> +	{}
> +};
> +
> +static struct kunit_suite drm_exec_test_suite = {
> +	.name = "drm_exec",
> +	.suite_init = drm_exec_suite_init,
> +	.test_cases = drm_exec_tests,
> +};
> +
> +kunit_test_suite(drm_exec_test_suite);
> +
> +MODULE_AUTHOR("AMD");
> +MODULE_LICENSE("GPL and additional rights");

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] drm: add drm_exec selftests v2
  2023-05-04 12:07   ` Maíra Canal
@ 2023-05-04 12:52     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-05-04 12:52 UTC (permalink / raw)
  To: Maíra Canal, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel

Hi Maira,

Am 04.05.23 um 14:07 schrieb Maíra Canal:
> Hi Christian,
>
> It would be nice if you use the KUnit macros, instead of pr_info.

yeah this was initially written before the DRM tests moved to KUnit and 
I only quickly converted it over. Going to give this a cleanup.

Thanks,
Christian.

>
> On 5/4/23 08:51, Christian König wrote:
>> Largely just the initial skeleton.
>>
>> v2: add array test as well
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/Kconfig               |  1 +
>>   drivers/gpu/drm/tests/Makefile        |  3 +-
>>   drivers/gpu/drm/tests/drm_exec_test.c | 96 +++++++++++++++++++++++++++
>>   3 files changed, 99 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/tests/drm_exec_test.c
>>
>> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index 2dc81eb062eb..068e574e234e 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -80,6 +80,7 @@ config DRM_KUNIT_TEST
>>       select DRM_BUDDY
>>       select DRM_EXPORT_FOR_TESTS if m
>>       select DRM_KUNIT_TEST_HELPERS
>> +    select DRM_EXEC
>>       default KUNIT_ALL_TESTS
>>       help
>>         This builds unit tests for DRM. This option is not useful for
>> diff --git a/drivers/gpu/drm/tests/Makefile 
>> b/drivers/gpu/drm/tests/Makefile
>> index bca726a8f483..ba7baa622675 100644
>> --- a/drivers/gpu/drm/tests/Makefile
>> +++ b/drivers/gpu/drm/tests/Makefile
>> @@ -17,6 +17,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \
>>       drm_modes_test.o \
>>       drm_plane_helper_test.o \
>>       drm_probe_helper_test.o \
>> -    drm_rect_test.o
>> +    drm_rect_test.o    \
>> +    drm_exec_test.o
>>     CFLAGS_drm_mm_test.o := $(DISABLE_STRUCTLEAK_PLUGIN)
>> diff --git a/drivers/gpu/drm/tests/drm_exec_test.c 
>> b/drivers/gpu/drm/tests/drm_exec_test.c
>> new file mode 100644
>> index 000000000000..26aa13e62d22
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tests/drm_exec_test.c
>> @@ -0,0 +1,96 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2019 Intel Corporation
>> + */
>> +
>> +#define pr_fmt(fmt) "drm_exec: " fmt
>> +
>> +#include <kunit/test.h>
>> +
>> +#include <linux/module.h>
>> +#include <linux/prime_numbers.h>
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/drm_device.h>
>> +#include <drm/drm_gem.h>
>> +
>> +#include "../lib/drm_random.h"
>> +
>> +static struct drm_device dev;
>> +
>> +static void drm_exec_sanitycheck(struct kunit *test)
>> +{
>> +    struct drm_exec exec;
>> +
>> +    drm_exec_init(&exec, true);
>> +    drm_exec_fini(&exec);
>> +    pr_info("%s - ok!\n", __func__);
>
> Here you could use KUNIT_SUCCEED(test).
>
>> +}
>> +
>> +static void drm_exec_lock1(struct kunit *test)
>
> Is there a reason to call the function drm_exec_lock1 instead of
> just drm_exec_lock?
>
>> +{
>> +    struct drm_gem_object gobj = { };
>> +    struct drm_exec exec;
>> +    int ret;
>> +
>> +    drm_gem_private_object_init(&dev, &gobj, PAGE_SIZE);
>> +
>> +    drm_exec_init(&exec, true);
>> +    drm_exec_while_not_all_locked(&exec) {
>> +        ret = drm_exec_prepare_obj(&exec, &gobj, 1);
>> +        drm_exec_continue_on_contention(&exec);
>> +        if (ret) {
>> +            drm_exec_fini(&exec);
>> +            pr_err("%s - err %d!\n", __func__, ret);
>
> Here you could use KUNIT_FAIL. Same for the other function.
>
> Actually, it would be better if you created a function `exit`
> associated with the test suite, where you would call drm_exec_fini,
> and checked the ret variable with KUNIT_EXPECT_EQ(test, ret, 0) in
> the test.
>
>> +            return;
>> +        }
>> +    }
>> +    drm_exec_fini(&exec);
>> +    pr_info("%s - ok!\n", __func__);
>> +}
>> +
>> +static void drm_exec_lock_array(struct kunit *test)
>> +{
>> +    struct drm_gem_object gobj1 = { };
>> +    struct drm_gem_object gobj2 = { };
>> +    struct drm_gem_object *array[] = { &gobj1, &gobj2 };
>> +    struct drm_exec exec;
>> +    int ret;
>> +
>> +    drm_gem_private_object_init(&dev, &gobj1, PAGE_SIZE);
>> +    drm_gem_private_object_init(&dev, &gobj2, PAGE_SIZE);
>> +
>> +    drm_exec_init(&exec, true);
>> +    ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array), 0);
>> +    if (ret) {
>> +        drm_exec_fini(&exec);
>> +        pr_err("%s - err %d!\n", __func__, ret);
>> +        return;
>> +    }
>> +    drm_exec_fini(&exec)> +    pr_info("%s - ok!\n", __func__);
>> +}
>> +
>> +static int drm_exec_suite_init(struct kunit_suite *suite)
>> +{
>> +    kunit_info(suite, "Testing DRM exec manager\n");
>
> Isn't this already clear by the name of the test?
>
> Best Regards,
> - Maíra Canal
>
>> +    return 0;
>> +}
>> +
>> +static struct kunit_case drm_exec_tests[] = {
>> +    KUNIT_CASE(drm_exec_sanitycheck),
>> +    KUNIT_CASE(drm_exec_lock1),
>> +    KUNIT_CASE(drm_exec_lock_array),
>> +    {}
>> +};
>> +
>> +static struct kunit_suite drm_exec_test_suite = {
>> +    .name = "drm_exec",
>> +    .suite_init = drm_exec_suite_init,
>> +    .test_cases = drm_exec_tests,
>> +};
>> +
>> +kunit_test_suite(drm_exec_test_suite);
>> +
>> +MODULE_AUTHOR("AMD");
>> +MODULE_LICENSE("GPL and additional rights");


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
@ 2023-05-04 14:02   ` Thomas Hellström (Intel)
  2023-05-25 20:42   ` Danilo Krummrich
  2023-06-14 12:23   ` Boris Brezillon
  2 siblings, 0 replies; 50+ messages in thread
From: Thomas Hellström (Intel) @ 2023-05-04 14:02 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, dakr, luben.tuikov, amd-gfx, dri-devel


On 5/4/23 13:51, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
>
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
>
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measurable.
> v3: drop duplicate tracking, radeon is really the only one needing that.
> v4: fixes issues pointed out by Danilo, some typos in comments and a
>      helper for lock arrays of GEM objects.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
...
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;

Hi, Christian. Did you consider using a list here with links embedded in 
gem objects, now that only locked objects are to be put on the list / array.

That should work as only the process owning the lock may access the list 
link. Apart from getting rid of reallocing this is beneficial for the 
more general types of ww transactions that are used by i915 (and to a 
minor extent xe as well, I think).

In those cases we would want to unlock a temporary held object within 
the while_not_all_locked() loop and would then have to search the entire 
array for the correct pointer. Instead one could just remove it from the 
list of locked objects.

Thanks,

Thomas


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
  2023-05-04 14:02   ` Thomas Hellström (Intel)
@ 2023-05-25 20:42   ` Danilo Krummrich
  2023-06-14 12:23   ` Boris Brezillon
  2 siblings, 0 replies; 50+ messages in thread
From: Danilo Krummrich @ 2023-05-25 20:42 UTC (permalink / raw)
  To: Christian König
  Cc: arunpravin.paneerselvam, felix.kuehling, francois.dugast,
	amd-gfx, luben.tuikov, dri-devel, thomas_os

On 5/4/23 13:51, Christian König wrote:
> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.
> 
> v2: drop xarray and use dynamic resized array instead, the locking
>      overhead is unecessary and measurable.
> v3: drop duplicate tracking, radeon is really the only one needing that.
> v4: fixes issues pointed out by Danilo, some typos in comments and a
>      helper for lock arrays of GEM objects.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Reviewed-by: Danilo Krummrich <dakr@redhat.com>

> ---
>   Documentation/gpu/drm-mm.rst |  12 ++
>   drivers/gpu/drm/Kconfig      |   6 +
>   drivers/gpu/drm/Makefile     |   2 +
>   drivers/gpu/drm/drm_exec.c   | 278 +++++++++++++++++++++++++++++++++++
>   include/drm/drm_exec.h       | 119 +++++++++++++++
>   5 files changed, 417 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index a79fd3549ff8..a52e6f4117d6 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -493,6 +493,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index ba3fb04bb691..2dc81eb062eb 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -201,6 +201,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index a33257d2bc7f..9c6446eb3c83 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..18071bff20f4
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,278 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *
> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Dummy value used to initially enter the retry loop */
> +#define DRM_EXEC_DUMMY (void*)~0
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +
> +	drm_gem_object_put(exec->prelocked);
> +	exec->prelocked = NULL;
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, bool interruptible)
> +{
> +	exec->interruptible = interruptible;
> +	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
> +	exec->num_objects = 0;
> +	exec->contended = DRM_EXEC_DUMMY;
> +	exec->prelocked = NULL;
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finalize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	kvfree(exec->objects);
> +	if (exec->contended != DRM_EXEC_DUMMY) {
> +		drm_gem_object_put(exec->contended);
> +		ww_acquire_fini(&exec->ticket);
> +	}
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_cleanup - cleanup when contention is detected
> + * @exec: the drm_exec object to cleanup
> + *
> + * Cleanup the current state and return true if we should stay inside the retry
> + * loop, false if there wasn't any contention detected and we can keep the
> + * objects locked.
> + */
> +bool drm_exec_cleanup(struct drm_exec *exec)
> +{
> +	if (likely(!exec->contended)) {
> +		ww_acquire_done(&exec->ticket);
> +		return false;
> +	}
> +
> +	if (likely(exec->contended == DRM_EXEC_DUMMY)) {
> +		exec->contended = NULL;
> +		ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +		return true;
> +	}
> +
> +	drm_exec_unlock_all(exec);
> +	exec->num_objects = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(drm_exec_cleanup);
> +
> +/* Track the locked object in the array */
> +static int drm_exec_obj_locked(struct drm_exec *exec,
> +			       struct drm_gem_object *obj)
> +{
> +	if (unlikely(exec->num_objects == exec->max_objects)) {
> +		size_t size = exec->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		exec->objects = tmp;
> +		exec->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	exec->objects[exec->num_objects++] = obj;
> +
> +	return 0;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->interruptible) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (unlikely(ret)) {
> +		dma_resv_unlock(obj->resv);
> +		goto error_dropref;
> +	}
> +
> +	swap(exec->prelocked, obj);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * successfully locked objects are put into the locked container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
> + * already locked, -ENOMEM when memory allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->prelocked == obj) {
> +		drm_gem_object_put(exec->prelocked);
> +		exec->prelocked = NULL;
> +
> +		return dma_resv_reserve_fences(obj->resv, num_fences);
> +	}
> +
> +	if (exec->interruptible)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret))
> +		return ret;
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (ret)
> +		goto error_unlock;
> +
> +	/* Keep locked when reserving fences fails */
> +	return dma_resv_reserve_fences(obj->resv, num_fences);
> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	drm_exec_while_not_all_locked(exec) {
> +		for (unsigned int i = 0; i < num_objects; ++i) {
> +			ret = drm_exec_prepare_obj(exec, objects[i],
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);
> +			if (unlikely(ret))
> +				return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_array);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..7c7481ed088a
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @interruptible: If locks should be taken interruptible
> +	 */
> +	bool			interruptible;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;
> +
> +	/**
> +	 * @contended: contended GEM object we backed off for
> +	 */
> +	struct drm_gem_object	*contended;
> +
> +	/**
> +	 * @prelocked: already locked GEM object due to contention
> +	 */
> +	struct drm_gem_object *prelocked;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	for (index = 0, obj = (exec)->objects[0];		\
> +	     index < (exec)->num_objects;			\
> +	     ++index, obj = (exec)->objects[index])
> +
> +/**
> + * drm_exec_while_not_all_locked - loop until all GEM objects are prepared
> + * @exec: drm_exec object
> + *
> + * Core functionality of the drm_exec object. Loops until all GEM objects are
> + * prepared and no more contention exists.
> + *
> + * At the beginning of the loop it is guaranteed that no GEM object is locked.
> + */
> +#define drm_exec_while_not_all_locked(exec)	\
> +	while (drm_exec_cleanup(exec))
> +
> +/**
> + * drm_exec_continue_on_contention - continue the loop when we need to cleanup
> + * @exec: drm_exec object
> + *
> + * Control flow helper to continue when a contention was detected and we need to
> + * clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_continue_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		continue
> +
> +/**
> + * drm_exec_break_on_contention - break a subordinal loop on contention
> + * @exec: drm_exec object
> + *
> + * Control flow helper to break a subordinal loop when a contention was detected
> + * and we need to clean up and re-start the loop to prepare all GEM objects.
> + */
> +#define drm_exec_break_on_contention(exec)		\
> +	if (unlikely(drm_exec_is_contended(exec)))	\
> +		break
> +
> +/**
> + * drm_exec_is_contended - check for contention
> + * @exec: drm_exec object
> + *
> + * Returns true if the drm_exec object has run into some contention while
> + * locking a GEM object and needs to clean up.
> + */
> +static inline bool drm_exec_is_contended(struct drm_exec *exec)
> +{
> +	return !!exec->contended;
> +}
> +
> +void drm_exec_init(struct drm_exec *exec, bool interruptible);
> +void drm_exec_fini(struct drm_exec *exec);
> +bool drm_exec_cleanup(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences);
> +
> +#endif


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-05-04 11:51 ` [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2 Christian König
@ 2023-06-12 13:16   ` Tatsuyuki Ishi
  2023-06-20  4:07   ` Tatsuyuki Ishi
  1 sibling, 0 replies; 50+ messages in thread
From: Tatsuyuki Ishi @ 2023-06-12 13:16 UTC (permalink / raw)
  To: ckoenig.leichtzumerken
  Cc: arunpravin.paneerselvam, felix.kuehling, francois.dugast,
	dri-devel, luben.tuikov, dakr, amd-gfx, thomas_os

Hi Chrisitan,
> On May 4, 2023, at 20:51, Christian König <ckoenig.leichtzumerken@gmail.com  <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
> Use the new component here as well and remove the old handling.
>
> v2: drop dupplicate handling

It seems that after dropping the duplicate handling, locking of VM PDs on global BO list is basically broken everywhere,
as bo->tbo.base.resv == vm->root.bo->tbo.base.resv for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID

Perhaps we need to bring dup handling back?

Tatsuyuki


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
  2023-05-04 14:02   ` Thomas Hellström (Intel)
  2023-05-25 20:42   ` Danilo Krummrich
@ 2023-06-14 12:23   ` Boris Brezillon
  2023-06-14 12:30     ` Christian König
  2 siblings, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-14 12:23 UTC (permalink / raw)
  To: Christian König
  Cc: arunpravin.paneerselvam, felix.kuehling, francois.dugast,
	amd-gfx, luben.tuikov, dakr, dri-devel, thomas_os

Hi Christian,

On Thu,  4 May 2023 13:51:47 +0200
"Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> This adds the infrastructure for an execution context for GEM buffers
> which is similar to the existing TTMs execbuf util and intended to replace
> it in the long term.
> 
> The basic functionality is that we abstracts the necessary loop to lock
> many different GEM buffers with automated deadlock and duplicate handling.

As many other drivers do already, we are considering using drm_exec()
for our resv locking in the PowerVR driver, so we might have more
questions/comments in the coming days/weeks, but I already have a
couple right now (see below).

> v3: drop duplicate tracking, radeon is really the only one needing that

I think we'd actually be interested in duplicate tracking. Is there any
way we can make it an optional feature through some extra helpers/flags?
Doesn't have to be done in this patch series, I'm just wondering if this
is something we can share as well.

[...]

> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + *	struct drm_gem_object *obj;
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	drm_exec_while_not_all_locked(&exec) {
> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *

Have you considered defining a drm_exec_try_prepare_obj_or_retry()
combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?

#define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
        ({ \
                int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
                if (unlikely(drm_exec_is_contended(exec))) \
                        continue; \
                __ret; \
        })

This way the following pattern

		ret = drm_exec_prepare_obj(&exec, boA, 1);
		drm_exec_continue_on_contention(&exec);
		if (ret)
			goto error;

can be turned into something more conventional:

		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
		if (ret)
			goto error;

I guess we could even add static checks to make sure
drm_exec_try_prepare_obj() is called inside a
drm_exec_while_not_all_locked() loop.

> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *		drm_exec_continue_on_contention(&exec);
> + *		if (ret)
> + *			goto error;
> + *	}
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */

[...]

> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	drm_exec_while_not_all_locked(exec) {
> +		for (unsigned int i = 0; i < num_objects; ++i) {
> +			ret = drm_exec_prepare_obj(exec, objects[i],
> +						   num_fences);
> +			drm_exec_break_on_contention(exec);

I had a hard time understanding what the intent was here: we do want the
locking to keep going on contention (reset and retry), but we need to
break out of the inner loop for this to happen, which is what this
drm_exec_break_on_contention() is doing. My misunderstanding was coming
from the fact I was expecting drm_exec_break_on_contention() to stop
the process of preparing objects. Maybe it's just me, but I think it'd
be less confusing if we were getting rid of
drm_exec_{break,continue}_on_contention and have the loop slightly
adjusted:

	unsigned int obj_ptr = 0;

	drm_exec_while_not_all_locked(exec) {
		int ret;

		/* We acquired/prepared all objects, we can leave the loop now. */
		if (obj_ptr == num_objects)
			break;

		ret = drm_exec_try_prepare_obj_or_retry(exec, objects[obj_ptr++],
							num_fences);
		if (ret)
			return ret;
	}

	return 0;

Of course, this is just my personal view on this, and none of these
comments should be considered as blockers, but I thought I'd share
my concerns anyway.

Thanks again for your work!

Regards,

Boris


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-14 12:23   ` Boris Brezillon
@ 2023-06-14 12:30     ` Christian König
  2023-06-14 13:02       ` Boris Brezillon
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-06-14 12:30 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: arunpravin.paneerselvam, felix.kuehling, francois.dugast,
	amd-gfx, luben.tuikov, dakr, dri-devel, thomas_os



Am 14.06.23 um 14:23 schrieb Boris Brezillon:
> Hi Christian,
>
> On Thu,  4 May 2023 13:51:47 +0200
> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> This adds the infrastructure for an execution context for GEM buffers
>> which is similar to the existing TTMs execbuf util and intended to replace
>> it in the long term.
>>
>> The basic functionality is that we abstracts the necessary loop to lock
>> many different GEM buffers with automated deadlock and duplicate handling.
> As many other drivers do already, we are considering using drm_exec()
> for our resv locking in the PowerVR driver, so we might have more
> questions/comments in the coming days/weeks, but I already have a
> couple right now (see below).
>
>> v3: drop duplicate tracking, radeon is really the only one needing that
> I think we'd actually be interested in duplicate tracking. Is there any
> way we can make it an optional feature through some extra helpers/flags?
> Doesn't have to be done in this patch series, I'm just wondering if this
> is something we can share as well.

You can still capture the -EALREADY error and act appropriately in your 
driver.

For radeon it just means ignoring the error code and going ahead, but 
that behavior doesn't seem to be desired in most cases.

Initially I though we need to separately track how many and how often 
BOs are duplicated, but there is simply no use for this.

>
> [...]
>
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the cleanup procedure
>> + * unlocks all previously locked GEM objects and locks the contended one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + *	struct drm_gem_object *obj;
>> + *	struct drm_exec exec;
>> + *	unsigned long index;
>> + *	int ret;
>> + *
>> + *	drm_exec_init(&exec, true);
>> + *	drm_exec_while_not_all_locked(&exec) {
>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *		drm_exec_continue_on_contention(&exec);
>> + *		if (ret)
>> + *			goto error;
>> + *
> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
>
> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
>          ({ \
>                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
>                  if (unlikely(drm_exec_is_contended(exec))) \
>                          continue; \
>                  __ret; \
>          })
>
> This way the following pattern
>
> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> 		drm_exec_continue_on_contention(&exec);
> 		if (ret)
> 			goto error;
>
> can be turned into something more conventional:
>
> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> 		if (ret)
> 			goto error;

Yeah, I was considering that as well. But then abandoned it as to 
complicated.

I really need to find some time to work on that anyway.

>
> I guess we could even add static checks to make sure
> drm_exec_try_prepare_obj() is called inside a
> drm_exec_while_not_all_locked() loop.

Interesting idea, but how would somebody do that?

Regards,
Christian.

>
>> + *		ret = drm_exec_prepare_obj(&exec, boB, 1);
>> + *		drm_exec_continue_on_contention(&exec);
>> + *		if (ret)
>> + *			goto error;
>> + *	}
>> + *
>> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *		...
>> + *	}
>> + *	drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
> [...]
>
>> +/**
>> + * drm_exec_prepare_array - helper to prepare an array of objects
>> + * @exec: the drm_exec object with the state
>> + * @objects: array of GEM object to prepare
>> + * @num_objects: number of GEM objects in the array
>> + * @num_fences: number of fences to reserve on each GEM object
>> + *
>> + * Prepares all GEM objects in an array, handles contention but aports on first
>> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
>> + *
>> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +			   struct drm_gem_object **objects,
>> +			   unsigned int num_objects,
>> +			   unsigned int num_fences)
>> +{
>> +	int ret;
>> +
>> +	drm_exec_while_not_all_locked(exec) {
>> +		for (unsigned int i = 0; i < num_objects; ++i) {
>> +			ret = drm_exec_prepare_obj(exec, objects[i],
>> +						   num_fences);
>> +			drm_exec_break_on_contention(exec);
> I had a hard time understanding what the intent was here: we do want the
> locking to keep going on contention (reset and retry), but we need to
> break out of the inner loop for this to happen, which is what this
> drm_exec_break_on_contention() is doing. My misunderstanding was coming
> from the fact I was expecting drm_exec_break_on_contention() to stop
> the process of preparing objects. Maybe it's just me, but I think it'd
> be less confusing if we were getting rid of
> drm_exec_{break,continue}_on_contention and have the loop slightly
> adjusted:
>
> 	unsigned int obj_ptr = 0;
>
> 	drm_exec_while_not_all_locked(exec) {
> 		int ret;
>
> 		/* We acquired/prepared all objects, we can leave the loop now. */
> 		if (obj_ptr == num_objects)
> 			break;
>
> 		ret = drm_exec_try_prepare_obj_or_retry(exec, objects[obj_ptr++],
> 							num_fences);
> 		if (ret)
> 			return ret;
> 	}
>
> 	return 0;
>
> Of course, this is just my personal view on this, and none of these
> comments should be considered as blockers, but I thought I'd share
> my concerns anyway.
>
> Thanks again for your work!
>
> Regards,
>
> Boris
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-14 12:30     ` Christian König
@ 2023-06-14 13:02       ` Boris Brezillon
  2023-06-17 11:54         ` Boris Brezillon
  0 siblings, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-14 13:02 UTC (permalink / raw)
  To: Christian König
  Cc: arunpravin.paneerselvam, felix.kuehling, francois.dugast,
	amd-gfx, luben.tuikov, dakr, dri-devel, thomas_os

On Wed, 14 Jun 2023 14:30:53 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 14.06.23 um 14:23 schrieb Boris Brezillon:
> > Hi Christian,
> >
> > On Thu,  4 May 2023 13:51:47 +0200
> > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> >  
> >> This adds the infrastructure for an execution context for GEM buffers
> >> which is similar to the existing TTMs execbuf util and intended to replace
> >> it in the long term.
> >>
> >> The basic functionality is that we abstracts the necessary loop to lock
> >> many different GEM buffers with automated deadlock and duplicate handling.  
> > As many other drivers do already, we are considering using drm_exec()
> > for our resv locking in the PowerVR driver, so we might have more
> > questions/comments in the coming days/weeks, but I already have a
> > couple right now (see below).
> >  
> >> v3: drop duplicate tracking, radeon is really the only one needing that  
> > I think we'd actually be interested in duplicate tracking. Is there any
> > way we can make it an optional feature through some extra helpers/flags?
> > Doesn't have to be done in this patch series, I'm just wondering if this
> > is something we can share as well.  
> 
> You can still capture the -EALREADY error and act appropriately in your 
> driver.
> 
> For radeon it just means ignoring the error code and going ahead, but 
> that behavior doesn't seem to be desired in most cases.
> 
> Initially I though we need to separately track how many and how often 
> BOs are duplicated, but there is simply no use for this.
> 
> >
> > [...]
> >  
> >> +/**
> >> + * DOC: Overview
> >> + *
> >> + * This component mainly abstracts the retry loop necessary for locking
> >> + * multiple GEM objects while preparing hardware operations (e.g. command
> >> + * submissions, page table updates etc..).
> >> + *
> >> + * If a contention is detected while locking a GEM object the cleanup procedure
> >> + * unlocks all previously locked GEM objects and locks the contended one first
> >> + * before locking any further objects.
> >> + *
> >> + * After an object is locked fences slots can optionally be reserved on the
> >> + * dma_resv object inside the GEM object.
> >> + *
> >> + * A typical usage pattern should look like this::
> >> + *
> >> + *	struct drm_gem_object *obj;
> >> + *	struct drm_exec exec;
> >> + *	unsigned long index;
> >> + *	int ret;
> >> + *
> >> + *	drm_exec_init(&exec, true);
> >> + *	drm_exec_while_not_all_locked(&exec) {
> >> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >> + *		drm_exec_continue_on_contention(&exec);
> >> + *		if (ret)
> >> + *			goto error;
> >> + *  
> > Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> > combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> >
> > #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> >          ({ \
> >                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> >                  if (unlikely(drm_exec_is_contended(exec))) \
> >                          continue; \
> >                  __ret; \
> >          })
> >
> > This way the following pattern
> >
> > 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > 		drm_exec_continue_on_contention(&exec);
> > 		if (ret)
> > 			goto error;
> >
> > can be turned into something more conventional:
> >
> > 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> > 		if (ret)
> > 			goto error;  
> 
> Yeah, I was considering that as well. But then abandoned it as to 
> complicated.
> 
> I really need to find some time to work on that anyway.
> 
> >
> > I guess we could even add static checks to make sure
> > drm_exec_try_prepare_obj() is called inside a
> > drm_exec_while_not_all_locked() loop.  
> 
> Interesting idea, but how would somebody do that?

There are probably better/cleaner ways, but the below diff
seems to catch cases where drm_exec_try_prepare_obj() is
called in a context where break/continue are allowed, but
that's not inside a drm_exec_while_not_all_locked() section.

What's missing though is a way to detect when it's called
from an inner loop.

---
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index 7c7481ed088a..1f4e0e1a7783 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -69,8 +69,10 @@ struct drm_exec {
  *
  * At the beginning of the loop it is guaranteed that no GEM object is locked.
  */
+#define __in_drm_exec_while_not_all_locked false
 #define drm_exec_while_not_all_locked(exec)    \
-       while (drm_exec_cleanup(exec))
+       for (const bool __in_drm_exec_while_not_all_locked = true; \
+            drm_exec_cleanup(exec); )
 
 /**
  * drm_exec_continue_on_contention - continue the loop when we need to cleanup
@@ -83,6 +85,25 @@ struct drm_exec {
        if (unlikely(drm_exec_is_contended(exec)))      \
                continue
 
+/**
+ * drm_exec_try_prepare_obj - Try prepare an object and retry on contention
+ * @exec: drm_exec object
+ * @obj: GEM object to prepare
+ * @num_fence: number of fence slots to reserve
+ *
+ * Wrapper around drm_exec_prepare_obj() that automatically retries on
+ * contention by going back to the head of the drm_exec_while_not_all_locked()
+ * loop.
+ */
+#define drm_exec_try_prepare_obj(exec, obj, num_fences) \
+       ({ \
+               int __ret = drm_exec_prepare_obj(exec, obj, num_fences); \
+               static_assert(__in_drm_exec_while_not_all_locked == true); \
+               if (unlikely(drm_exec_is_contended(exec))) \
+                       continue; \
+               __ret; \
+       })
+
 /**
  * drm_exec_break_on_contention - break a subordinal loop on contention
  * @exec: drm_exec object

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-14 13:02       ` Boris Brezillon
@ 2023-06-17 11:54         ` Boris Brezillon
  2023-06-19  8:59           ` Thomas Hellström (Intel)
  0 siblings, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-17 11:54 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	thomas_os

+Matthew who's been using drm_exec in Xe if I'm correct.

Hello Christian,

On Wed, 14 Jun 2023 15:02:52 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Wed, 14 Jun 2023 14:30:53 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 14.06.23 um 14:23 schrieb Boris Brezillon:  
> > > Hi Christian,
> > >
> > > On Thu,  4 May 2023 13:51:47 +0200
> > > "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
> > >    
> > >> This adds the infrastructure for an execution context for GEM buffers
> > >> which is similar to the existing TTMs execbuf util and intended to replace
> > >> it in the long term.
> > >>
> > >> The basic functionality is that we abstracts the necessary loop to lock
> > >> many different GEM buffers with automated deadlock and duplicate handling.    
> > > As many other drivers do already, we are considering using drm_exec()
> > > for our resv locking in the PowerVR driver, so we might have more
> > > questions/comments in the coming days/weeks, but I already have a
> > > couple right now (see below).
> > >    
> > >> v3: drop duplicate tracking, radeon is really the only one needing that    
> > > I think we'd actually be interested in duplicate tracking. Is there any
> > > way we can make it an optional feature through some extra helpers/flags?
> > > Doesn't have to be done in this patch series, I'm just wondering if this
> > > is something we can share as well.    
> > 
> > You can still capture the -EALREADY error and act appropriately in your 
> > driver.
> > 
> > For radeon it just means ignoring the error code and going ahead, but 
> > that behavior doesn't seem to be desired in most cases.
> > 
> > Initially I though we need to separately track how many and how often 
> > BOs are duplicated, but there is simply no use for this.
> >   
> > >
> > > [...]
> > >    
> > >> +/**
> > >> + * DOC: Overview
> > >> + *
> > >> + * This component mainly abstracts the retry loop necessary for locking
> > >> + * multiple GEM objects while preparing hardware operations (e.g. command
> > >> + * submissions, page table updates etc..).
> > >> + *
> > >> + * If a contention is detected while locking a GEM object the cleanup procedure
> > >> + * unlocks all previously locked GEM objects and locks the contended one first
> > >> + * before locking any further objects.
> > >> + *
> > >> + * After an object is locked fences slots can optionally be reserved on the
> > >> + * dma_resv object inside the GEM object.
> > >> + *
> > >> + * A typical usage pattern should look like this::
> > >> + *
> > >> + *	struct drm_gem_object *obj;
> > >> + *	struct drm_exec exec;
> > >> + *	unsigned long index;
> > >> + *	int ret;
> > >> + *
> > >> + *	drm_exec_init(&exec, true);
> > >> + *	drm_exec_while_not_all_locked(&exec) {
> > >> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > >> + *		drm_exec_continue_on_contention(&exec);
> > >> + *		if (ret)
> > >> + *			goto error;
> > >> + *    
> > > Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> > > combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> > >
> > > #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> > >          ({ \
> > >                  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> > >                  if (unlikely(drm_exec_is_contended(exec))) \
> > >                          continue; \
> > >                  __ret; \
> > >          })
> > >
> > > This way the following pattern
> > >
> > > 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> > > 		drm_exec_continue_on_contention(&exec);
> > > 		if (ret)
> > > 			goto error;
> > >
> > > can be turned into something more conventional:
> > >
> > > 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> > > 		if (ret)
> > > 			goto error;    
> > 
> > Yeah, I was considering that as well. But then abandoned it as to 
> > complicated.
> > 
> > I really need to find some time to work on that anyway.

I've been playing with drm_exec for a couple weeks now, and I wanted
to share something I hacked to try and make the API simpler and
more robust against misuse (see the below diff, which is a slightly
adjusted version of your work).

In this version, the user is no longer in control of the retry
loop. Instead, it provides an expression (a call to a
sub-function) to be re-evaluated each time a contention is
detected. IMHO, this makes the 'prepare-objs' functions easier to
apprehend, and avoids any mistake like calling
drm_exec_continue_on_contention() in an inner loop, or breaking
out of the drm_exec_while_all_locked() loop unintentionally.

It also makes the internal management a bit simpler, since we
no longer call drm_exec_cleanup() on the first attempt, and can
thus get rid of the DRM_EXEC_DUMMY trick.

In the below diff, I also re-introduced native support for
duplicates as an opt-in, so we don't have to do things like:

	ret = drm_exec_prepare_obj(exec, obj, num_fences);
	if (ret == -EALREADY)
		ret = dma_resv_reserve_fences(obj->resv, num_fences);
	if (ret)
		return ret;

and can just do:

	ret = drm_exec_prepare_obj(exec, obj, num_fences);
	if (ret)
		return;

Of course drivers can open-code a wrapper doing the same thing, but
given at least pvr and radeon need this, it'd be nice if the core
could support it natively.

That's mostly it. Just wanted to share what I had in case you're
interested. If not, that's fine too.

Regards,

Boris
---
 Documentation/gpu/drm-mm.rst |  12 ++
 drivers/gpu/drm/Kconfig      |   6 +
 drivers/gpu/drm/Makefile     |   2 +
 drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
 include/drm/drm_exec.h       | 130 +++++++++++++++++
 5 files changed, 424 insertions(+)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index fe40ee686f6e..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -524,6 +524,18 @@ DRM Sync Objects
 .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
    :export:
 
+DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
 GPU Scheduler
 =============
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 76991720637c..01a38fcdb1c4 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -194,6 +194,12 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_EXEC
+	tristate
+	depends on DRM
+	help
+	  Execution context for command submissions
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1873f64db171..18a02eaf2d49 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
 #
 # Memory-management helpers
 #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..e0ad1a3e1610
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * int prepare_objs_func(struct drm_exec *exec, ...)
+ * {
+ *	struct drm_gem_object *boA, *boB;
+ * 	int ret;
+ *
+ *	<retrieve boA and boB here>
+ *
+ *	ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *	if (ret)
+ *		return ret;
+ *
+ *	ret = drm_exec_prepare_obj(&exec, boB, 1);
+ *	if (ret)
+ *		return ret;
+ *
+ * 	return 0;
+ * }
+ *
+ * int some_func()
+ * {
+ *	struct drm_exec exec;
+ *	unsigned long index;
+ *	int ret;
+ *
+ *	drm_exec_init(&exec, true);
+ *	ret = drm_exec_until_all_locked(&exec, prepare_objs_func(&exec, ...));
+ *	if (ret)
+ *		goto error;
+ *
+ *	drm_exec_for_each_locked_object(&exec, index, obj) {
+ *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *		...
+ *	}
+ *	drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_unlock(obj->resv);
+		drm_gem_object_put(obj);
+	}
+
+	drm_gem_object_put(exec->prelocked);
+	exec->prelocked = NULL;
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked objects.
+ */
+void drm_exec_init(struct drm_exec *exec, u32 flags)
+{
+	exec->flags = flags;
+	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/* If allocation here fails, just delay that till the first use */
+	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
+	exec->num_objects = 0;
+	exec->contended = NULL;
+	exec->prelocked = NULL;
+	ww_acquire_init(&exec->ticket, &reservation_ww_class);
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finalize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+	drm_exec_unlock_all(exec);
+	kvfree(exec->objects);
+	if (exec->contended)
+		drm_gem_object_put(exec->contended);
+	ww_acquire_fini(&exec->ticket);
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_reset - reset a drm_exec object after a contention
+ * @exec: the drm_exec object to reset
+ *
+ * Unlock all locked objects and resets the number of objects locked.
+ */
+void drm_exec_reset(struct drm_exec *exec)
+{
+	WARN_ON(!exec->contended);
+	drm_exec_unlock_all(exec);
+	exec->num_objects = 0;
+}
+EXPORT_SYMBOL(drm_exec_reset);
+
+/* Track the locked object in the array */
+static int drm_exec_obj_locked(struct drm_exec *exec,
+			       struct drm_gem_object *obj)
+{
+	if (unlikely(exec->num_objects == exec->max_objects)) {
+		size_t size = exec->max_objects * sizeof(void *);
+		void *tmp;
+
+		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
+				GFP_KERNEL);
+		if (!tmp)
+			return -ENOMEM;
+
+		exec->objects = tmp;
+		exec->max_objects += PAGE_SIZE / sizeof(void *);
+	}
+	drm_gem_object_get(obj);
+	exec->objects[exec->num_objects++] = obj;
+
+	return 0;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+	struct drm_gem_object *obj = exec->contended;
+	int ret;
+
+	if (likely(!obj))
+		return 0;
+
+	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
+		ret = dma_resv_lock_slow_interruptible(obj->resv,
+						       &exec->ticket);
+		if (unlikely(ret))
+			goto error_dropref;
+	} else {
+		dma_resv_lock_slow(obj->resv, &exec->ticket);
+	}
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (unlikely(ret)) {
+		dma_resv_unlock(obj->resv);
+		goto error_dropref;
+	}
+
+	swap(exec->prelocked, obj);
+
+error_dropref:
+	/* Always cleanup the contention so that error handling can kick in */
+	drm_gem_object_put(obj);
+	exec->contended = NULL;
+	return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * successfully locked objects are put into the locked container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
+ * already locked, -ENOMEM when memory allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences)
+{
+	int ret;
+
+	ret = drm_exec_lock_contended(exec);
+	if (unlikely(ret))
+		return ret;
+
+	if (exec->prelocked == obj) {
+		drm_gem_object_put(exec->prelocked);
+		exec->prelocked = NULL;
+
+		return dma_resv_reserve_fences(obj->resv, num_fences);
+	}
+
+	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
+		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+	else
+		ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+	if (unlikely(ret == -EDEADLK)) {
+		drm_gem_object_get(obj);
+		exec->contended = obj;
+		return -EDEADLK;
+	}
+
+	if (unlikely(ret == -EALREADY &&
+	    (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
+		goto reserve_fences;
+
+	if (unlikely(ret))
+		return ret;
+
+	ret = drm_exec_obj_locked(exec, obj);
+	if (ret)
+		goto error_unlock;
+
+reserve_fences:
+	/* Keep locked when reserving fences fails */
+	return dma_resv_reserve_fences(obj->resv, num_fences);
+
+error_unlock:
+	dma_resv_unlock(obj->resv);
+	return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+/**
+ * drm_exec_prepare_array - helper to prepare an array of objects
+ * @exec: the drm_exec object with the state
+ * @objects: array of GEM object to prepare
+ * @num_objects: number of GEM objects in the array
+ * @num_fences: number of fences to reserve on each GEM object
+ *
+ * Prepares all GEM objects in an array, handles contention but aports on first
+ * error otherwise. Reserves @num_fences on each GEM object after locking it.
+ *
+ * Returns: -EALREADY when object is already locked, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences)
+{
+	int ret;
+
+	for (unsigned int i = 0; i < num_objects; ++i) {
+		ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_exec_prepare_array);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..b1a5da0509c1
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * enum drm_exec_flags - Execution context flags
+ */
+enum drm_exec_flags {
+	/**
+	 * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible locking
+	 * functions.
+	 */
+	DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
+
+	/**
+	 * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY errors.
+	 */
+	DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
+};
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+	/**
+	 * @flags: Combinations of DRM_EXEC_FLAG_* flags.
+	 */
+	u32 flags;
+
+	/**
+	 * @ticket: WW ticket used for acquiring locks
+	 */
+	struct ww_acquire_ctx	ticket;
+
+	/**
+	 * @num_objects: number of objects locked
+	 */
+	unsigned int		num_objects;
+
+	/**
+	 * @max_objects: maximum objects in array
+	 */
+	unsigned int		max_objects;
+
+	/**
+	 * @objects: array of the locked objects
+	 */
+	struct drm_gem_object	**objects;
+
+	/**
+	 * @contended: contended GEM object we backed off for
+	 */
+	struct drm_gem_object	*contended;
+
+	/**
+	 * @prelocked: already locked GEM object due to contention
+	 */
+	struct drm_gem_object *prelocked;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj)	\
+	for (index = 0, obj = (exec)->objects[0];		\
+	     index < (exec)->num_objects;			\
+	     ++index, obj = (exec)->objects[index])
+
+/**
+ * drm_exec_until_all_locked - retry objects preparation until all objects
+ * are locked
+ * @exec: drm_exec object
+ * @expr: expression to be evaluated on each attempt
+ *
+ * This helper tries to prepare objects and if a deadlock is detected,
+ * rollbacks and retries.
+ *
+ * @expr is typically a function that tries to prepare objects using
+ * drm_exec_prepare_obj().
+ *
+ * If we take drm_exec_prepare_array() as an example, you should do:
+ *
+ *	ret = drm_exec_until_all_locked(exec,
+ *					drm_exec_prepare_array(exec,
+ *							       objs,
+ *							       num_objs,
+ *							       num_fences));
+ *	if (ret)
+ *		goto error_path;
+ *
+ *	...
+ *
+ * Returns: 0 on success, a negative error code on failure.
+ */
+#define drm_exec_until_all_locked(exec, expr)		\
+	({						\
+		__label__ retry;			\
+		int __ret;				\
+retry:							\
+		__ret = expr;				\
+		if ((exec)->contended) {		\
+			WARN_ON(__ret != -EDEADLK);	\
+			drm_exec_reset(exec);		\
+			goto retry;			\
+		}					\
+		ww_acquire_done(&(exec)->ticket);	\
+		__ret;					\
+	})
+
+void drm_exec_init(struct drm_exec *exec, u32 flags);
+void drm_exec_fini(struct drm_exec *exec);
+void drm_exec_reset(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+			 unsigned int num_fences);
+int drm_exec_prepare_array(struct drm_exec *exec,
+			   struct drm_gem_object **objects,
+			   unsigned int num_objects,
+			   unsigned int num_fences);
+
+#endif

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-17 11:54         ` Boris Brezillon
@ 2023-06-19  8:59           ` Thomas Hellström (Intel)
  2023-06-19  9:20             ` Christian König
  2023-06-19 10:12             ` Boris Brezillon
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Hellström (Intel) @ 2023-06-19  8:59 UTC (permalink / raw)
  To: Boris Brezillon, Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel


On 6/17/23 13:54, Boris Brezillon wrote:
> +Matthew who's been using drm_exec in Xe if I'm correct.
>
> Hello Christian,
>
> On Wed, 14 Jun 2023 15:02:52 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
>> On Wed, 14 Jun 2023 14:30:53 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 14.06.23 um 14:23 schrieb Boris Brezillon:
>>>> Hi Christian,
>>>>
>>>> On Thu,  4 May 2023 13:51:47 +0200
>>>> "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>     
>>>>> This adds the infrastructure for an execution context for GEM buffers
>>>>> which is similar to the existing TTMs execbuf util and intended to replace
>>>>> it in the long term.
>>>>>
>>>>> The basic functionality is that we abstracts the necessary loop to lock
>>>>> many different GEM buffers with automated deadlock and duplicate handling.
>>>> As many other drivers do already, we are considering using drm_exec()
>>>> for our resv locking in the PowerVR driver, so we might have more
>>>> questions/comments in the coming days/weeks, but I already have a
>>>> couple right now (see below).
>>>>     
>>>>> v3: drop duplicate tracking, radeon is really the only one needing that
>>>> I think we'd actually be interested in duplicate tracking. Is there any
>>>> way we can make it an optional feature through some extra helpers/flags?
>>>> Doesn't have to be done in this patch series, I'm just wondering if this
>>>> is something we can share as well.
>>> You can still capture the -EALREADY error and act appropriately in your
>>> driver.
>>>
>>> For radeon it just means ignoring the error code and going ahead, but
>>> that behavior doesn't seem to be desired in most cases.
>>>
>>> Initially I though we need to separately track how many and how often
>>> BOs are duplicated, but there is simply no use for this.
>>>    
>>>> [...]
>>>>     
>>>>> +/**
>>>>> + * DOC: Overview
>>>>> + *
>>>>> + * This component mainly abstracts the retry loop necessary for locking
>>>>> + * multiple GEM objects while preparing hardware operations (e.g. command
>>>>> + * submissions, page table updates etc..).
>>>>> + *
>>>>> + * If a contention is detected while locking a GEM object the cleanup procedure
>>>>> + * unlocks all previously locked GEM objects and locks the contended one first
>>>>> + * before locking any further objects.
>>>>> + *
>>>>> + * After an object is locked fences slots can optionally be reserved on the
>>>>> + * dma_resv object inside the GEM object.
>>>>> + *
>>>>> + * A typical usage pattern should look like this::
>>>>> + *
>>>>> + *	struct drm_gem_object *obj;
>>>>> + *	struct drm_exec exec;
>>>>> + *	unsigned long index;
>>>>> + *	int ret;
>>>>> + *
>>>>> + *	drm_exec_init(&exec, true);
>>>>> + *	drm_exec_while_not_all_locked(&exec) {
>>>>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
>>>>> + *		drm_exec_continue_on_contention(&exec);
>>>>> + *		if (ret)
>>>>> + *			goto error;
>>>>> + *
>>>> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
>>>> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
>>>>
>>>> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
>>>>           ({ \
>>>>                   int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
>>>>                   if (unlikely(drm_exec_is_contended(exec))) \
>>>>                           continue; \
>>>>                   __ret; \
>>>>           })
>>>>
>>>> This way the following pattern
>>>>
>>>> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
>>>> 		drm_exec_continue_on_contention(&exec);
>>>> 		if (ret)
>>>> 			goto error;
>>>>
>>>> can be turned into something more conventional:
>>>>
>>>> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
>>>> 		if (ret)
>>>> 			goto error;
>>> Yeah, I was considering that as well. But then abandoned it as to
>>> complicated.
>>>
>>> I really need to find some time to work on that anyway.
> I've been playing with drm_exec for a couple weeks now, and I wanted
> to share something I hacked to try and make the API simpler and
> more robust against misuse (see the below diff, which is a slightly
> adjusted version of your work).

It would be good if we could have someone taking charge of this series 
and address all review comments, I see some of my comments getting lost, 
we have multiple submitters and I can't find a dri-devel patchwork entry 
for this. Anyway some comments below.

>
> In this version, the user is no longer in control of the retry
> loop. Instead, it provides an expression (a call to a
> sub-function) to be re-evaluated each time a contention is
> detected. IMHO, this makes the 'prepare-objs' functions easier to
> apprehend, and avoids any mistake like calling
> drm_exec_continue_on_contention() in an inner loop, or breaking
> out of the drm_exec_while_all_locked() loop unintentionally.

In i915 we've had a very similar helper to this, and while I agree this 
newer version would probably help make code cleaner, but OTOH there also 
are some places where the short drm_exec_while_all_locked() -likeblock 
don't really motivate a separate function. Porting i915 to the current 
version will take some work, For  the xe driver both versions would work 
fine.

Some additional review comments not related to the interface change below:

>
> It also makes the internal management a bit simpler, since we
> no longer call drm_exec_cleanup() on the first attempt, and can
> thus get rid of the DRM_EXEC_DUMMY trick.
>
> In the below diff, I also re-introduced native support for
> duplicates as an opt-in, so we don't have to do things like:
>
> 	ret = drm_exec_prepare_obj(exec, obj, num_fences);
> 	if (ret == -EALREADY)
> 		ret = dma_resv_reserve_fences(obj->resv, num_fences);
> 	if (ret)
> 		return ret;
>
> and can just do:
>
> 	ret = drm_exec_prepare_obj(exec, obj, num_fences);
> 	if (ret)
> 		return;
>
> Of course drivers can open-code a wrapper doing the same thing, but
> given at least pvr and radeon need this, it'd be nice if the core
> could support it natively.
>
> That's mostly it. Just wanted to share what I had in case you're
> interested. If not, that's fine too.
>
> Regards,
>
> Boris
> ---
>   Documentation/gpu/drm-mm.rst |  12 ++
>   drivers/gpu/drm/Kconfig      |   6 +
>   drivers/gpu/drm/Makefile     |   2 +
>   drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>   5 files changed, 424 insertions(+)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 include/drm/drm_exec.h
>
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index fe40ee686f6e..c9f120cfe730 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -524,6 +524,18 @@ DRM Sync Objects
>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>      :export:
>   
> +DRM Execution context
> +=====================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :doc: Overview
> +
> +.. kernel-doc:: include/drm/drm_exec.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
> +   :export:
> +
>   GPU Scheduler
>   =============
>   
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 76991720637c..01a38fcdb1c4 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -194,6 +194,12 @@ config DRM_TTM
>   	  GPU memory types. Will be enabled automatically if a device driver
>   	  uses it.
>   
> +config DRM_EXEC
> +	tristate
> +	depends on DRM
> +	help
> +	  Execution context for command submissions
> +
>   config DRM_BUDDY
>   	tristate
>   	depends on DRM
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 1873f64db171..18a02eaf2d49 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
>   #
>   # Memory-management helpers
>   #
> +#
> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>   
>   obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>   
> diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
> new file mode 100644
> index 000000000000..e0ad1a3e1610
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_exec.c
> @@ -0,0 +1,274 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-resv.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * This component mainly abstracts the retry loop necessary for locking
> + * multiple GEM objects while preparing hardware operations (e.g. command
> + * submissions, page table updates etc..).
> + *
> + * If a contention is detected while locking a GEM object the cleanup procedure
> + * unlocks all previously locked GEM objects and locks the contended one first
> + * before locking any further objects.
> + *
> + * After an object is locked fences slots can optionally be reserved on the
> + * dma_resv object inside the GEM object.
> + *
> + * A typical usage pattern should look like this::
> + *
> + * int prepare_objs_func(struct drm_exec *exec, ...)
> + * {
> + *	struct drm_gem_object *boA, *boB;
> + * 	int ret;
> + *
> + *	<retrieve boA and boB here>
> + *
> + *	ret = drm_exec_prepare_obj(&exec, boA, 1);
> + *	if (ret)
> + *		return ret;
> + *
> + *	ret = drm_exec_prepare_obj(&exec, boB, 1);
> + *	if (ret)
> + *		return ret;
> + *
> + * 	return 0;
> + * }
> + *
> + * int some_func()
> + * {
> + *	struct drm_exec exec;
> + *	unsigned long index;
> + *	int ret;
> + *
> + *	drm_exec_init(&exec, true);
> + *	ret = drm_exec_until_all_locked(&exec, prepare_objs_func(&exec, ...));
> + *	if (ret)
> + *		goto error;
> + *
> + *	drm_exec_for_each_locked_object(&exec, index, obj) {
> + *		dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
> + *		...
> + *	}
> + *	drm_exec_fini(&exec);
> + *
> + * See struct dma_exec for more details.
> + */
> +
> +/* Unlock all objects and drop references */
> +static void drm_exec_unlock_all(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_unlock(obj->resv);
> +		drm_gem_object_put(obj);
> +	}
> +
> +	drm_gem_object_put(exec->prelocked);
> +	exec->prelocked = NULL;
> +}
> +
> +/**
> + * drm_exec_init - initialize a drm_exec object
> + * @exec: the drm_exec object to initialize
> + * @interruptible: if locks should be acquired interruptible
> + *
> + * Initialize the object and make sure that we can track locked objects.
> + */
> +void drm_exec_init(struct drm_exec *exec, u32 flags)
> +{
> +	exec->flags = flags;
> +	exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +
> +	/* If allocation here fails, just delay that till the first use */
> +	exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
> +	exec->num_objects = 0;
> +	exec->contended = NULL;
> +	exec->prelocked = NULL;
> +	ww_acquire_init(&exec->ticket, &reservation_ww_class);
> +}
> +EXPORT_SYMBOL(drm_exec_init);
> +
> +/**
> + * drm_exec_fini - finalize a drm_exec object
> + * @exec: the drm_exec object to finalize
> + *
> + * Unlock all locked objects, drop the references to objects and free all memory
> + * used for tracking the state.
> + */
> +void drm_exec_fini(struct drm_exec *exec)
> +{
> +	drm_exec_unlock_all(exec);
> +	kvfree(exec->objects);
> +	if (exec->contended)
> +		drm_gem_object_put(exec->contended);
> +	ww_acquire_fini(&exec->ticket);
> +}
> +EXPORT_SYMBOL(drm_exec_fini);
> +
> +/**
> + * drm_exec_reset - reset a drm_exec object after a contention
> + * @exec: the drm_exec object to reset
> + *
> + * Unlock all locked objects and resets the number of objects locked.
> + */
> +void drm_exec_reset(struct drm_exec *exec)
> +{
> +	WARN_ON(!exec->contended);
> +	drm_exec_unlock_all(exec);
> +	exec->num_objects = 0;
> +}
> +EXPORT_SYMBOL(drm_exec_reset);
> +
> +/* Track the locked object in the array */
> +static int drm_exec_obj_locked(struct drm_exec *exec,
> +			       struct drm_gem_object *obj)
> +{
> +	if (unlikely(exec->num_objects == exec->max_objects)) {
> +		size_t size = exec->max_objects * sizeof(void *);
> +		void *tmp;
> +
> +		tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> +				GFP_KERNEL);
> +		if (!tmp)
> +			return -ENOMEM;

Sometimes you need to just temporarily lock an object and then unlock it 
again if it goes out of scope before reaching the end of 
_until_all_locked(). In that case you might need to remove a lock from 
the array. I *think* for all use-cases in i915 it would suffice to take 
a snapshot of num_objects, and unlock everything above that, having 
exec->objects behave like a stack, but was ever a list considered 
instead of a realloced array?

> +
> +		exec->objects = tmp;
> +		exec->max_objects += PAGE_SIZE / sizeof(void *);
> +	}
> +	drm_gem_object_get(obj);
> +	exec->objects[exec->num_objects++] = obj;
> +
> +	return 0;
> +}
> +
> +/* Make sure the contended object is locked first */
> +static int drm_exec_lock_contended(struct drm_exec *exec)
> +{
> +	struct drm_gem_object *obj = exec->contended;
> +	int ret;
> +
> +	if (likely(!obj))
> +		return 0;
> +
> +	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
> +		ret = dma_resv_lock_slow_interruptible(obj->resv,
> +						       &exec->ticket);
> +		if (unlikely(ret))
> +			goto error_dropref;
> +	} else {
> +		dma_resv_lock_slow(obj->resv, &exec->ticket);
> +	}
> +

Sometimes you want to just drop the contended lock after the above 
relaxation. (Eviction would be one example), and not add as prelocked, 
if the contended object goes out of scope. Eviction would in some 
situations be one such example, -EDEADLOCK leading to an error path 
where the object should otherwise be freed is another. Perhaps we could 
add an argument to prepare_obj() as to whether the object should be 
immediately put after relaxation.

> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (unlikely(ret)) {
> +		dma_resv_unlock(obj->resv);
> +		goto error_dropref;
> +	}
> +
> +	swap(exec->prelocked, obj);
> +
> +error_dropref:
> +	/* Always cleanup the contention so that error handling can kick in */
> +	drm_gem_object_put(obj);
> +	exec->contended = NULL;
> +	return ret;
> +}
> +
> +/**
> + * drm_exec_prepare_obj - prepare a GEM object for use
> + * @exec: the drm_exec object with the state
> + * @obj: the GEM object to prepare
> + * @num_fences: how many fences to reserve
> + *
> + * Prepare a GEM object for use by locking it and reserving fence slots. All
> + * successfully locked objects are put into the locked container.
> + *
> + * Returns: -EDEADLK if a contention is detected, -EALREADY when object is
> + * already locked, -ENOMEM when memory allocation failed and zero for success.
> + */
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences)
> +{
> +	int ret;
> +
> +	ret = drm_exec_lock_contended(exec);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (exec->prelocked == obj) {
> +		drm_gem_object_put(exec->prelocked);
> +		exec->prelocked = NULL;
> +
> +		return dma_resv_reserve_fences(obj->resv, num_fences);
> +	}
> +
> +	if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
> +		ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
> +	else
> +		ret = dma_resv_lock(obj->resv, &exec->ticket);
> +
> +	if (unlikely(ret == -EDEADLK)) {
> +		drm_gem_object_get(obj);
> +		exec->contended = obj;
> +		return -EDEADLK;
> +	}
> +
> +	if (unlikely(ret == -EALREADY &&
> +	    (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
> +		goto reserve_fences;
> +
> +	if (unlikely(ret))
> +		return ret;
> +
> +	ret = drm_exec_obj_locked(exec, obj);
> +	if (ret)
> +		goto error_unlock;
> +
> +reserve_fences:
> +	/* Keep locked when reserving fences fails */
> +	return dma_resv_reserve_fences(obj->resv, num_fences);

Ugh, what is the use-case for keeping things locked here? How would a 
caller tell the difference between an error where everything is locked 
and nothing is locked? IMO, we should unlock on error here. If there 
indeed is a use-case we should add a separate function for reserving 
fences for all locked objects, rather than returning sometimes locked on 
error sometime not.

Thanks,

Thomas


> +
> +error_unlock:
> +	dma_resv_unlock(obj->resv);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_obj);
> +
> +/**
> + * drm_exec_prepare_array - helper to prepare an array of objects
> + * @exec: the drm_exec object with the state
> + * @objects: array of GEM object to prepare
> + * @num_objects: number of GEM objects in the array
> + * @num_fences: number of fences to reserve on each GEM object
> + *
> + * Prepares all GEM objects in an array, handles contention but aports on first
> + * error otherwise. Reserves @num_fences on each GEM object after locking it.
> + *
> + * Returns: -EALREADY when object is already locked, -ENOMEM when memory
> + * allocation failed and zero for success.
> + */
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences)
> +{
> +	int ret;
> +
> +	for (unsigned int i = 0; i < num_objects; ++i) {
> +		ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_exec_prepare_array);
> +
> +MODULE_DESCRIPTION("DRM execution context");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> new file mode 100644
> index 000000000000..b1a5da0509c1
> --- /dev/null
> +++ b/include/drm/drm_exec.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +
> +#ifndef __DRM_EXEC_H__
> +#define __DRM_EXEC_H__
> +
> +#include <linux/ww_mutex.h>
> +
> +struct drm_gem_object;
> +
> +/**
> + * enum drm_exec_flags - Execution context flags
> + */
> +enum drm_exec_flags {
> +	/**
> +	 * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible locking
> +	 * functions.
> +	 */
> +	DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
> +
> +	/**
> +	 * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY errors.
> +	 */
> +	DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
> +};
> +
> +/**
> + * struct drm_exec - Execution context
> + */
> +struct drm_exec {
> +	/**
> +	 * @flags: Combinations of DRM_EXEC_FLAG_* flags.
> +	 */
> +	u32 flags;
> +
> +	/**
> +	 * @ticket: WW ticket used for acquiring locks
> +	 */
> +	struct ww_acquire_ctx	ticket;
> +
> +	/**
> +	 * @num_objects: number of objects locked
> +	 */
> +	unsigned int		num_objects;
> +
> +	/**
> +	 * @max_objects: maximum objects in array
> +	 */
> +	unsigned int		max_objects;
> +
> +	/**
> +	 * @objects: array of the locked objects
> +	 */
> +	struct drm_gem_object	**objects;
> +
> +	/**
> +	 * @contended: contended GEM object we backed off for
> +	 */
> +	struct drm_gem_object	*contended;
> +
> +	/**
> +	 * @prelocked: already locked GEM object due to contention
> +	 */
> +	struct drm_gem_object *prelocked;
> +};
> +
> +/**
> + * drm_exec_for_each_locked_object - iterate over all the locked objects
> + * @exec: drm_exec object
> + * @index: unsigned long index for the iteration
> + * @obj: the current GEM object
> + *
> + * Iterate over all the locked GEM objects inside the drm_exec object.
> + */
> +#define drm_exec_for_each_locked_object(exec, index, obj)	\
> +	for (index = 0, obj = (exec)->objects[0];		\
> +	     index < (exec)->num_objects;			\
> +	     ++index, obj = (exec)->objects[index])
> +
> +/**
> + * drm_exec_until_all_locked - retry objects preparation until all objects
> + * are locked
> + * @exec: drm_exec object
> + * @expr: expression to be evaluated on each attempt
> + *
> + * This helper tries to prepare objects and if a deadlock is detected,
> + * rollbacks and retries.
> + *
> + * @expr is typically a function that tries to prepare objects using
> + * drm_exec_prepare_obj().
> + *
> + * If we take drm_exec_prepare_array() as an example, you should do:
> + *
> + *	ret = drm_exec_until_all_locked(exec,
> + *					drm_exec_prepare_array(exec,
> + *							       objs,
> + *							       num_objs,
> + *							       num_fences));
> + *	if (ret)
> + *		goto error_path;
> + *
> + *	...
> + *
> + * Returns: 0 on success, a negative error code on failure.
> + */
> +#define drm_exec_until_all_locked(exec, expr)		\
> +	({						\
> +		__label__ retry;			\
> +		int __ret;				\
> +retry:							\
> +		__ret = expr;				\
> +		if ((exec)->contended) {		\
> +			WARN_ON(__ret != -EDEADLK);	\
> +			drm_exec_reset(exec);		\
> +			goto retry;			\
> +		}					\
> +		ww_acquire_done(&(exec)->ticket);	\
> +		__ret;					\
> +	})
> +
> +void drm_exec_init(struct drm_exec *exec, u32 flags);
> +void drm_exec_fini(struct drm_exec *exec);
> +void drm_exec_reset(struct drm_exec *exec);
> +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
> +			 unsigned int num_fences);
> +int drm_exec_prepare_array(struct drm_exec *exec,
> +			   struct drm_gem_object **objects,
> +			   unsigned int num_objects,
> +			   unsigned int num_fences);
> +
> +#endif

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  8:59           ` Thomas Hellström (Intel)
@ 2023-06-19  9:20             ` Christian König
  2023-06-19  9:33               ` Thomas Hellström (Intel)
  2023-06-19 10:23               ` Boris Brezillon
  2023-06-19 10:12             ` Boris Brezillon
  1 sibling, 2 replies; 50+ messages in thread
From: Christian König @ 2023-06-19  9:20 UTC (permalink / raw)
  To: Thomas Hellström (Intel), Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel

Hi guys,

Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
> [SNIP]
>>>>
>>>> I really need to find some time to work on that anyway.
>> I've been playing with drm_exec for a couple weeks now, and I wanted
>> to share something I hacked to try and make the API simpler and
>> more robust against misuse (see the below diff, which is a slightly
>> adjusted version of your work).
>
> It would be good if we could have someone taking charge of this series 
> and address all review comments, I see some of my comments getting 
> lost, we have multiple submitters and I can't find a dri-devel 
> patchwork entry for this. Anyway some comments below.

I can try to find some time for the series this week (As long as nobody 
comes along and has any burning roof).

>
>>
>> In this version, the user is no longer in control of the retry
>> loop. Instead, it provides an expression (a call to a
>> sub-function) to be re-evaluated each time a contention is
>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>> apprehend, and avoids any mistake like calling
>> drm_exec_continue_on_contention() in an inner loop, or breaking
>> out of the drm_exec_while_all_locked() loop unintentionally.
>
> In i915 we've had a very similar helper to this, and while I agree 
> this newer version would probably help make code cleaner, but OTOH 
> there also are some places where the short drm_exec_while_all_locked() 
> -likeblock don't really motivate a separate function. Porting i915 to 
> the current version will take some work, For  the xe driver both 
> versions would work fine.

Yeah, this is actually what my first version of this looked like. But I 
abandoned that approach because we have a lot of cases were we just 
quickly want to lock a few GEM objects and don't want the extra overhead 
of putting all the state into some bag to forward it to a function.

>
> Some additional review comments not related to the interface change 
> below:
>
>>
>> It also makes the internal management a bit simpler, since we
>> no longer call drm_exec_cleanup() on the first attempt, and can
>> thus get rid of the DRM_EXEC_DUMMY trick.
>>
>> In the below diff, I also re-introduced native support for
>> duplicates as an opt-in, so we don't have to do things like:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret == -EALREADY)
>>         ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>     if (ret)
>>         return ret;
>>
>> and can just do:
>>
>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>     if (ret)
>>         return;
>>
>> Of course drivers can open-code a wrapper doing the same thing, but
>> given at least pvr and radeon need this, it'd be nice if the core
>> could support it natively.
>>
>> That's mostly it. Just wanted to share what I had in case you're
>> interested. If not, that's fine too.
>>
>> Regards,
>>
>> Boris
>> ---
>>   Documentation/gpu/drm-mm.rst |  12 ++
>>   drivers/gpu/drm/Kconfig      |   6 +
>>   drivers/gpu/drm/Makefile     |   2 +
>>   drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
>>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>>   5 files changed, 424 insertions(+)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 include/drm/drm_exec.h
>>
>> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
>> index fe40ee686f6e..c9f120cfe730 100644
>> --- a/Documentation/gpu/drm-mm.rst
>> +++ b/Documentation/gpu/drm-mm.rst
>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>      :export:
>>   +DRM Execution context
>> +=====================
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :doc: Overview
>> +
>> +.. kernel-doc:: include/drm/drm_exec.h
>> +   :internal:
>> +
>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>> +   :export:
>> +
>>   GPU Scheduler
>>   =============
>>   diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index 76991720637c..01a38fcdb1c4 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -194,6 +194,12 @@ config DRM_TTM
>>         GPU memory types. Will be enabled automatically if a device 
>> driver
>>         uses it.
>>   +config DRM_EXEC
>> +    tristate
>> +    depends on DRM
>> +    help
>> +      Execution context for command submissions
>> +
>>   config DRM_BUDDY
>>       tristate
>>       depends on DRM
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 1873f64db171..18a02eaf2d49 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>> drm_panel_orientation_quirks.o
>>   #
>>   # Memory-management helpers
>>   #
>> +#
>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>     obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>   diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>> new file mode 100644
>> index 000000000000..e0ad1a3e1610
>> --- /dev/null
>> +++ b/drivers/gpu/drm/drm_exec.c
>> @@ -0,0 +1,274 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#include <drm/drm_exec.h>
>> +#include <drm/drm_gem.h>
>> +#include <linux/dma-resv.h>
>> +
>> +/**
>> + * DOC: Overview
>> + *
>> + * This component mainly abstracts the retry loop necessary for locking
>> + * multiple GEM objects while preparing hardware operations (e.g. 
>> command
>> + * submissions, page table updates etc..).
>> + *
>> + * If a contention is detected while locking a GEM object the 
>> cleanup procedure
>> + * unlocks all previously locked GEM objects and locks the contended 
>> one first
>> + * before locking any further objects.
>> + *
>> + * After an object is locked fences slots can optionally be reserved 
>> on the
>> + * dma_resv object inside the GEM object.
>> + *
>> + * A typical usage pattern should look like this::
>> + *
>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>> + * {
>> + *    struct drm_gem_object *boA, *boB;
>> + *     int ret;
>> + *
>> + *    <retrieve boA and boB here>
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boA, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *    ret = drm_exec_prepare_obj(&exec, boB, 1);
>> + *    if (ret)
>> + *        return ret;
>> + *
>> + *     return 0;
>> + * }
>> + *
>> + * int some_func()
>> + * {
>> + *    struct drm_exec exec;
>> + *    unsigned long index;
>> + *    int ret;
>> + *
>> + *    drm_exec_init(&exec, true);
>> + *    ret = drm_exec_until_all_locked(&exec, 
>> prepare_objs_func(&exec, ...));
>> + *    if (ret)
>> + *        goto error;
>> + *
>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>> + *        ...
>> + *    }
>> + *    drm_exec_fini(&exec);
>> + *
>> + * See struct dma_exec for more details.
>> + */
>> +
>> +/* Unlock all objects and drop references */
>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj;
>> +    unsigned long index;
>> +
>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>> +        dma_resv_unlock(obj->resv);
>> +        drm_gem_object_put(obj);
>> +    }
>> +
>> +    drm_gem_object_put(exec->prelocked);
>> +    exec->prelocked = NULL;
>> +}
>> +
>> +/**
>> + * drm_exec_init - initialize a drm_exec object
>> + * @exec: the drm_exec object to initialize
>> + * @interruptible: if locks should be acquired interruptible
>> + *
>> + * Initialize the object and make sure that we can track locked 
>> objects.
>> + */
>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>> +{
>> +    exec->flags = flags;
>> +    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> +    /* If allocation here fails, just delay that till the first use */
>> +    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
>> +    exec->num_objects = 0;
>> +    exec->contended = NULL;
>> +    exec->prelocked = NULL;
>> +    ww_acquire_init(&exec->ticket, &reservation_ww_class);
>> +}
>> +EXPORT_SYMBOL(drm_exec_init);
>> +
>> +/**
>> + * drm_exec_fini - finalize a drm_exec object
>> + * @exec: the drm_exec object to finalize
>> + *
>> + * Unlock all locked objects, drop the references to objects and 
>> free all memory
>> + * used for tracking the state.
>> + */
>> +void drm_exec_fini(struct drm_exec *exec)
>> +{
>> +    drm_exec_unlock_all(exec);
>> +    kvfree(exec->objects);
>> +    if (exec->contended)
>> +        drm_gem_object_put(exec->contended);
>> +    ww_acquire_fini(&exec->ticket);
>> +}
>> +EXPORT_SYMBOL(drm_exec_fini);
>> +
>> +/**
>> + * drm_exec_reset - reset a drm_exec object after a contention
>> + * @exec: the drm_exec object to reset
>> + *
>> + * Unlock all locked objects and resets the number of objects locked.
>> + */
>> +void drm_exec_reset(struct drm_exec *exec)
>> +{
>> +    WARN_ON(!exec->contended);
>> +    drm_exec_unlock_all(exec);
>> +    exec->num_objects = 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_reset);
>> +
>> +/* Track the locked object in the array */
>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>> +                   struct drm_gem_object *obj)
>> +{
>> +    if (unlikely(exec->num_objects == exec->max_objects)) {
>> +        size_t size = exec->max_objects * sizeof(void *);
>> +        void *tmp;
>> +
>> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>> +                GFP_KERNEL);
>> +        if (!tmp)
>> +            return -ENOMEM;
>
> Sometimes you need to just temporarily lock an object and then unlock 
> it again if it goes out of scope before reaching the end of 
> _until_all_locked(). In that case you might need to remove a lock from 
> the array. I *think* for all use-cases in i915 it would suffice to 
> take a snapshot of num_objects, and unlock everything above that, 
> having exec->objects behave like a stack, but was ever a list 
> considered instead of a realloced array?

Yes, the problem is that linked lists really suck regarding their cache 
line locality. That's why I've came up with this approach here.

What we could maybe do is to allow unlocking objects, but with the cost 
of linear backward searching for them in the array.

>
>> +
>> +        exec->objects = tmp;
>> +        exec->max_objects += PAGE_SIZE / sizeof(void *);
>> +    }
>> +    drm_gem_object_get(obj);
>> +    exec->objects[exec->num_objects++] = obj;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Make sure the contended object is locked first */
>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>> +{
>> +    struct drm_gem_object *obj = exec->contended;
>> +    int ret;
>> +
>> +    if (likely(!obj))
>> +        return 0;
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>> +                               &exec->ticket);
>> +        if (unlikely(ret))
>> +            goto error_dropref;
>> +    } else {
>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>> +    }
>> +
>
> Sometimes you want to just drop the contended lock after the above 
> relaxation. (Eviction would be one example), and not add as prelocked, 
> if the contended object goes out of scope. Eviction would in some 
> situations be one such example, -EDEADLOCK leading to an error path 
> where the object should otherwise be freed is another. Perhaps we 
> could add an argument to prepare_obj() as to whether the object should 
> be immediately put after relaxation.

I was considering a try_prepare version as well, that should cover this 
use case.

>
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (unlikely(ret)) {
>> +        dma_resv_unlock(obj->resv);
>> +        goto error_dropref;
>> +    }
>> +
>> +    swap(exec->prelocked, obj);
>> +
>> +error_dropref:
>> +    /* Always cleanup the contention so that error handling can kick 
>> in */
>> +    drm_gem_object_put(obj);
>> +    exec->contended = NULL;
>> +    return ret;
>> +}
>> +
>> +/**
>> + * drm_exec_prepare_obj - prepare a GEM object for use
>> + * @exec: the drm_exec object with the state
>> + * @obj: the GEM object to prepare
>> + * @num_fences: how many fences to reserve
>> + *
>> + * Prepare a GEM object for use by locking it and reserving fence 
>> slots. All
>> + * successfully locked objects are put into the locked container.
>> + *
>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>> object is
>> + * already locked, -ENOMEM when memory allocation failed and zero 
>> for success.
>> + */
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    ret = drm_exec_lock_contended(exec);
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    if (exec->prelocked == obj) {
>> +        drm_gem_object_put(exec->prelocked);
>> +        exec->prelocked = NULL;
>> +
>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>> +    }
>> +
>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>> +    else
>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>> +
>> +    if (unlikely(ret == -EDEADLK)) {
>> +        drm_gem_object_get(obj);
>> +        exec->contended = obj;
>> +        return -EDEADLK;
>> +    }
>> +
>> +    if (unlikely(ret == -EALREADY &&
>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>> +        goto reserve_fences;
>> +
>> +    if (unlikely(ret))
>> +        return ret;
>> +
>> +    ret = drm_exec_obj_locked(exec, obj);
>> +    if (ret)
>> +        goto error_unlock;
>> +
>> +reserve_fences:
>> +    /* Keep locked when reserving fences fails */
>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>
> Ugh, what is the use-case for keeping things locked here? How would a 
> caller tell the difference between an error where everything is locked 
> and nothing is locked? IMO, we should unlock on error here. If there 
> indeed is a use-case we should add a separate function for reserving 
> fences for all locked objects, rather than returning sometimes locked 
> on error sometime not.

We return the object locked here because it was to much churn to remove 
it again from the array and we are getting fully cleaned up at the end 
anyway.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>
>> +
>> +error_unlock:
>> +    dma_resv_unlock(obj->resv);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>> +
>> +/**
>> + * drm_exec_prepare_array - helper to prepare an array of objects
>> + * @exec: the drm_exec object with the state
>> + * @objects: array of GEM object to prepare
>> + * @num_objects: number of GEM objects in the array
>> + * @num_fences: number of fences to reserve on each GEM object
>> + *
>> + * Prepares all GEM objects in an array, handles contention but 
>> aports on first
>> + * error otherwise. Reserves @num_fences on each GEM object after 
>> locking it.
>> + *
>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>> memory
>> + * allocation failed and zero for success.
>> + */
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences)
>> +{
>> +    int ret;
>> +
>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>> +
>> +MODULE_DESCRIPTION("DRM execution context");
>> +MODULE_LICENSE("Dual MIT/GPL");
>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>> new file mode 100644
>> index 000000000000..b1a5da0509c1
>> --- /dev/null
>> +++ b/include/drm/drm_exec.h
>> @@ -0,0 +1,130 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>> +
>> +#ifndef __DRM_EXEC_H__
>> +#define __DRM_EXEC_H__
>> +
>> +#include <linux/ww_mutex.h>
>> +
>> +struct drm_gem_object;
>> +
>> +/**
>> + * enum drm_exec_flags - Execution context flags
>> + */
>> +enum drm_exec_flags {
>> +    /**
>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible 
>> locking
>> +     * functions.
>> +     */
>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>> +
>> +    /**
>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY 
>> errors.
>> +     */
>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>> +};
>> +
>> +/**
>> + * struct drm_exec - Execution context
>> + */
>> +struct drm_exec {
>> +    /**
>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>> +     */
>> +    u32 flags;
>> +
>> +    /**
>> +     * @ticket: WW ticket used for acquiring locks
>> +     */
>> +    struct ww_acquire_ctx    ticket;
>> +
>> +    /**
>> +     * @num_objects: number of objects locked
>> +     */
>> +    unsigned int        num_objects;
>> +
>> +    /**
>> +     * @max_objects: maximum objects in array
>> +     */
>> +    unsigned int        max_objects;
>> +
>> +    /**
>> +     * @objects: array of the locked objects
>> +     */
>> +    struct drm_gem_object    **objects;
>> +
>> +    /**
>> +     * @contended: contended GEM object we backed off for
>> +     */
>> +    struct drm_gem_object    *contended;
>> +
>> +    /**
>> +     * @prelocked: already locked GEM object due to contention
>> +     */
>> +    struct drm_gem_object *prelocked;
>> +};
>> +
>> +/**
>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>> objects
>> + * @exec: drm_exec object
>> + * @index: unsigned long index for the iteration
>> + * @obj: the current GEM object
>> + *
>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>> + */
>> +#define drm_exec_for_each_locked_object(exec, index, obj)    \
>> +    for (index = 0, obj = (exec)->objects[0];        \
>> +         index < (exec)->num_objects;            \
>> +         ++index, obj = (exec)->objects[index])
>> +
>> +/**
>> + * drm_exec_until_all_locked - retry objects preparation until all 
>> objects
>> + * are locked
>> + * @exec: drm_exec object
>> + * @expr: expression to be evaluated on each attempt
>> + *
>> + * This helper tries to prepare objects and if a deadlock is detected,
>> + * rollbacks and retries.
>> + *
>> + * @expr is typically a function that tries to prepare objects using
>> + * drm_exec_prepare_obj().
>> + *
>> + * If we take drm_exec_prepare_array() as an example, you should do:
>> + *
>> + *    ret = drm_exec_until_all_locked(exec,
>> + *                    drm_exec_prepare_array(exec,
>> + *                                   objs,
>> + *                                   num_objs,
>> + *                                   num_fences));
>> + *    if (ret)
>> + *        goto error_path;
>> + *
>> + *    ...
>> + *
>> + * Returns: 0 on success, a negative error code on failure.
>> + */
>> +#define drm_exec_until_all_locked(exec, expr)        \
>> +    ({                        \
>> +        __label__ retry;            \
>> +        int __ret;                \
>> +retry:                            \
>> +        __ret = expr;                \
>> +        if ((exec)->contended) {        \
>> +            WARN_ON(__ret != -EDEADLK);    \
>> +            drm_exec_reset(exec);        \
>> +            goto retry;            \
>> +        }                    \
>> +        ww_acquire_done(&(exec)->ticket);    \
>> +        __ret;                    \
>> +    })
>> +
>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>> +void drm_exec_fini(struct drm_exec *exec);
>> +void drm_exec_reset(struct drm_exec *exec);
>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>> drm_gem_object *obj,
>> +             unsigned int num_fences);
>> +int drm_exec_prepare_array(struct drm_exec *exec,
>> +               struct drm_gem_object **objects,
>> +               unsigned int num_objects,
>> +               unsigned int num_fences);
>> +
>> +#endif


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  9:20             ` Christian König
@ 2023-06-19  9:33               ` Thomas Hellström (Intel)
  2023-06-19  9:48                 ` Christian König
  2023-06-19 10:23               ` Boris Brezillon
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Hellström (Intel) @ 2023-06-19  9:33 UTC (permalink / raw)
  To: Christian König, Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel

Hi!

On 6/19/23 11:20, Christian König wrote:
> Hi guys,
>
> Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>>>
>>>>> I really need to find some time to work on that anyway.
>>> I've been playing with drm_exec for a couple weeks now, and I wanted
>>> to share something I hacked to try and make the API simpler and
>>> more robust against misuse (see the below diff, which is a slightly
>>> adjusted version of your work).
>>
>> It would be good if we could have someone taking charge of this 
>> series and address all review comments, I see some of my comments 
>> getting lost, we have multiple submitters and I can't find a 
>> dri-devel patchwork entry for this. Anyway some comments below.
>
> I can try to find some time for the series this week (As long as 
> nobody comes along and has any burning roof).
>
>>
>>>
>>> In this version, the user is no longer in control of the retry
>>> loop. Instead, it provides an expression (a call to a
>>> sub-function) to be re-evaluated each time a contention is
>>> detected. IMHO, this makes the 'prepare-objs' functions easier to
>>> apprehend, and avoids any mistake like calling
>>> drm_exec_continue_on_contention() in an inner loop, or breaking
>>> out of the drm_exec_while_all_locked() loop unintentionally.
>>
>> In i915 we've had a very similar helper to this, and while I agree 
>> this newer version would probably help make code cleaner, but OTOH 
>> there also are some places where the short 
>> drm_exec_while_all_locked() -likeblock don't really motivate a 
>> separate function. Porting i915 to the current version will take some 
>> work, For  the xe driver both versions would work fine.
>
> Yeah, this is actually what my first version of this looked like. But 
> I abandoned that approach because we have a lot of cases were we just 
> quickly want to lock a few GEM objects and don't want the extra 
> overhead of putting all the state into some bag to forward it to a 
> function.
>
>>
>> Some additional review comments not related to the interface change 
>> below:
>>
>>>
>>> It also makes the internal management a bit simpler, since we
>>> no longer call drm_exec_cleanup() on the first attempt, and can
>>> thus get rid of the DRM_EXEC_DUMMY trick.
>>>
>>> In the below diff, I also re-introduced native support for
>>> duplicates as an opt-in, so we don't have to do things like:
>>>
>>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>>     if (ret == -EALREADY)
>>>         ret = dma_resv_reserve_fences(obj->resv, num_fences);
>>>     if (ret)
>>>         return ret;
>>>
>>> and can just do:
>>>
>>>     ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>>     if (ret)
>>>         return;
>>>
>>> Of course drivers can open-code a wrapper doing the same thing, but
>>> given at least pvr and radeon need this, it'd be nice if the core
>>> could support it natively.
>>>
>>> That's mostly it. Just wanted to share what I had in case you're
>>> interested. If not, that's fine too.
>>>
>>> Regards,
>>>
>>> Boris
>>> ---
>>>   Documentation/gpu/drm-mm.rst |  12 ++
>>>   drivers/gpu/drm/Kconfig      |   6 +
>>>   drivers/gpu/drm/Makefile     |   2 +
>>>   drivers/gpu/drm/drm_exec.c   | 274 
>>> +++++++++++++++++++++++++++++++++++
>>>   include/drm/drm_exec.h       | 130 +++++++++++++++++
>>>   5 files changed, 424 insertions(+)
>>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>>   create mode 100644 include/drm/drm_exec.h
>>>
>>> diff --git a/Documentation/gpu/drm-mm.rst 
>>> b/Documentation/gpu/drm-mm.rst
>>> index fe40ee686f6e..c9f120cfe730 100644
>>> --- a/Documentation/gpu/drm-mm.rst
>>> +++ b/Documentation/gpu/drm-mm.rst
>>> @@ -524,6 +524,18 @@ DRM Sync Objects
>>>   .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
>>>      :export:
>>>   +DRM Execution context
>>> +=====================
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> +   :doc: Overview
>>> +
>>> +.. kernel-doc:: include/drm/drm_exec.h
>>> +   :internal:
>>> +
>>> +.. kernel-doc:: drivers/gpu/drm/drm_exec.c
>>> +   :export:
>>> +
>>>   GPU Scheduler
>>>   =============
>>>   diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>>> index 76991720637c..01a38fcdb1c4 100644
>>> --- a/drivers/gpu/drm/Kconfig
>>> +++ b/drivers/gpu/drm/Kconfig
>>> @@ -194,6 +194,12 @@ config DRM_TTM
>>>         GPU memory types. Will be enabled automatically if a device 
>>> driver
>>>         uses it.
>>>   +config DRM_EXEC
>>> +    tristate
>>> +    depends on DRM
>>> +    help
>>> +      Execution context for command submissions
>>> +
>>>   config DRM_BUDDY
>>>       tristate
>>>       depends on DRM
>>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>>> index 1873f64db171..18a02eaf2d49 100644
>>> --- a/drivers/gpu/drm/Makefile
>>> +++ b/drivers/gpu/drm/Makefile
>>> @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
>>> drm_panel_orientation_quirks.o
>>>   #
>>>   # Memory-management helpers
>>>   #
>>> +#
>>> +obj-$(CONFIG_DRM_EXEC) += drm_exec.o
>>>     obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>>>   diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
>>> new file mode 100644
>>> index 000000000000..e0ad1a3e1610
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/drm_exec.c
>>> @@ -0,0 +1,274 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#include <drm/drm_exec.h>
>>> +#include <drm/drm_gem.h>
>>> +#include <linux/dma-resv.h>
>>> +
>>> +/**
>>> + * DOC: Overview
>>> + *
>>> + * This component mainly abstracts the retry loop necessary for 
>>> locking
>>> + * multiple GEM objects while preparing hardware operations (e.g. 
>>> command
>>> + * submissions, page table updates etc..).
>>> + *
>>> + * If a contention is detected while locking a GEM object the 
>>> cleanup procedure
>>> + * unlocks all previously locked GEM objects and locks the 
>>> contended one first
>>> + * before locking any further objects.
>>> + *
>>> + * After an object is locked fences slots can optionally be 
>>> reserved on the
>>> + * dma_resv object inside the GEM object.
>>> + *
>>> + * A typical usage pattern should look like this::
>>> + *
>>> + * int prepare_objs_func(struct drm_exec *exec, ...)
>>> + * {
>>> + *    struct drm_gem_object *boA, *boB;
>>> + *     int ret;
>>> + *
>>> + *    <retrieve boA and boB here>
>>> + *
>>> + *    ret = drm_exec_prepare_obj(&exec, boA, 1);
>>> + *    if (ret)
>>> + *        return ret;
>>> + *
>>> + *    ret = drm_exec_prepare_obj(&exec, boB, 1);
>>> + *    if (ret)
>>> + *        return ret;
>>> + *
>>> + *     return 0;
>>> + * }
>>> + *
>>> + * int some_func()
>>> + * {
>>> + *    struct drm_exec exec;
>>> + *    unsigned long index;
>>> + *    int ret;
>>> + *
>>> + *    drm_exec_init(&exec, true);
>>> + *    ret = drm_exec_until_all_locked(&exec, 
>>> prepare_objs_func(&exec, ...));
>>> + *    if (ret)
>>> + *        goto error;
>>> + *
>>> + *    drm_exec_for_each_locked_object(&exec, index, obj) {
>>> + *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
>>> + *        ...
>>> + *    }
>>> + *    drm_exec_fini(&exec);
>>> + *
>>> + * See struct dma_exec for more details.
>>> + */
>>> +
>>> +/* Unlock all objects and drop references */
>>> +static void drm_exec_unlock_all(struct drm_exec *exec)
>>> +{
>>> +    struct drm_gem_object *obj;
>>> +    unsigned long index;
>>> +
>>> +    drm_exec_for_each_locked_object(exec, index, obj) {
>>> +        dma_resv_unlock(obj->resv);
>>> +        drm_gem_object_put(obj);
>>> +    }
>>> +
>>> +    drm_gem_object_put(exec->prelocked);
>>> +    exec->prelocked = NULL;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_init - initialize a drm_exec object
>>> + * @exec: the drm_exec object to initialize
>>> + * @interruptible: if locks should be acquired interruptible
>>> + *
>>> + * Initialize the object and make sure that we can track locked 
>>> objects.
>>> + */
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags)
>>> +{
>>> +    exec->flags = flags;
>>> +    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>> +
>>> +    /* If allocation here fails, just delay that till the first use */
>>> +    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) 
>>> : 0;
>>> +    exec->num_objects = 0;
>>> +    exec->contended = NULL;
>>> +    exec->prelocked = NULL;
>>> +    ww_acquire_init(&exec->ticket, &reservation_ww_class);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_init);
>>> +
>>> +/**
>>> + * drm_exec_fini - finalize a drm_exec object
>>> + * @exec: the drm_exec object to finalize
>>> + *
>>> + * Unlock all locked objects, drop the references to objects and 
>>> free all memory
>>> + * used for tracking the state.
>>> + */
>>> +void drm_exec_fini(struct drm_exec *exec)
>>> +{
>>> +    drm_exec_unlock_all(exec);
>>> +    kvfree(exec->objects);
>>> +    if (exec->contended)
>>> +        drm_gem_object_put(exec->contended);
>>> +    ww_acquire_fini(&exec->ticket);
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_fini);
>>> +
>>> +/**
>>> + * drm_exec_reset - reset a drm_exec object after a contention
>>> + * @exec: the drm_exec object to reset
>>> + *
>>> + * Unlock all locked objects and resets the number of objects locked.
>>> + */
>>> +void drm_exec_reset(struct drm_exec *exec)
>>> +{
>>> +    WARN_ON(!exec->contended);
>>> +    drm_exec_unlock_all(exec);
>>> +    exec->num_objects = 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_reset);
>>> +
>>> +/* Track the locked object in the array */
>>> +static int drm_exec_obj_locked(struct drm_exec *exec,
>>> +                   struct drm_gem_object *obj)
>>> +{
>>> +    if (unlikely(exec->num_objects == exec->max_objects)) {
>>> +        size_t size = exec->max_objects * sizeof(void *);
>>> +        void *tmp;
>>> +
>>> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
>>> +                GFP_KERNEL);
>>> +        if (!tmp)
>>> +            return -ENOMEM;
>>
>> Sometimes you need to just temporarily lock an object and then unlock 
>> it again if it goes out of scope before reaching the end of 
>> _until_all_locked(). In that case you might need to remove a lock 
>> from the array. I *think* for all use-cases in i915 it would suffice 
>> to take a snapshot of num_objects, and unlock everything above that, 
>> having exec->objects behave like a stack, but was ever a list 
>> considered instead of a realloced array?
>
> Yes, the problem is that linked lists really suck regarding their 
> cache line locality. That's why I've came up with this approach here.
>
> What we could maybe do is to allow unlocking objects, but with the 
> cost of linear backward searching for them in the array.
>
>>
>>> +
>>> +        exec->objects = tmp;
>>> +        exec->max_objects += PAGE_SIZE / sizeof(void *);
>>> +    }
>>> +    drm_gem_object_get(obj);
>>> +    exec->objects[exec->num_objects++] = obj;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* Make sure the contended object is locked first */
>>> +static int drm_exec_lock_contended(struct drm_exec *exec)
>>> +{
>>> +    struct drm_gem_object *obj = exec->contended;
>>> +    int ret;
>>> +
>>> +    if (likely(!obj))
>>> +        return 0;
>>> +
>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
>>> +        ret = dma_resv_lock_slow_interruptible(obj->resv,
>>> +                               &exec->ticket);
>>> +        if (unlikely(ret))
>>> +            goto error_dropref;
>>> +    } else {
>>> +        dma_resv_lock_slow(obj->resv, &exec->ticket);
>>> +    }
>>> +
>>
>> Sometimes you want to just drop the contended lock after the above 
>> relaxation. (Eviction would be one example), and not add as 
>> prelocked, if the contended object goes out of scope. Eviction would 
>> in some situations be one such example, -EDEADLOCK leading to an 
>> error path where the object should otherwise be freed is another. 
>> Perhaps we could add an argument to prepare_obj() as to whether the 
>> object should be immediately put after relaxation.
>
> I was considering a try_prepare version as well, that should cover 
> this use case.

That sounds a bit different from this use-case. The use-case above 
would, on -EDEADLOCK actually unlock everything, then lock-slow the 
contending lock and then immediately unlock it and drop. It sounds like 
try_prepare would just skip locking and continue with everything locked 
so far still locked?

>
>>
>>> +    ret = drm_exec_obj_locked(exec, obj);
>>> +    if (unlikely(ret)) {
>>> +        dma_resv_unlock(obj->resv);
>>> +        goto error_dropref;
>>> +    }
>>> +
>>> +    swap(exec->prelocked, obj);
>>> +
>>> +error_dropref:
>>> +    /* Always cleanup the contention so that error handling can 
>>> kick in */
>>> +    drm_gem_object_put(obj);
>>> +    exec->contended = NULL;
>>> +    return ret;
>>> +}
>>> +
>>> +/**
>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>> + * @exec: the drm_exec object with the state
>>> + * @obj: the GEM object to prepare
>>> + * @num_fences: how many fences to reserve
>>> + *
>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>> slots. All
>>> + * successfully locked objects are put into the locked container.
>>> + *
>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>> object is
>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>> for success.
>>> + */
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>> drm_gem_object *obj,
>>> +             unsigned int num_fences)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = drm_exec_lock_contended(exec);
>>> +    if (unlikely(ret))
>>> +        return ret;
>>> +
>>> +    if (exec->prelocked == obj) {
>>> +        drm_gem_object_put(exec->prelocked);
>>> +        exec->prelocked = NULL;
>>> +
>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>> +    }
>>> +
>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>> +    else
>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>> +
>>> +    if (unlikely(ret == -EDEADLK)) {
>>> +        drm_gem_object_get(obj);
>>> +        exec->contended = obj;
>>> +        return -EDEADLK;
>>> +    }
>>> +
>>> +    if (unlikely(ret == -EALREADY &&
>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>> +        goto reserve_fences;
>>> +
>>> +    if (unlikely(ret))
>>> +        return ret;
>>> +
>>> +    ret = drm_exec_obj_locked(exec, obj);
>>> +    if (ret)
>>> +        goto error_unlock;
>>> +
>>> +reserve_fences:
>>> +    /* Keep locked when reserving fences fails */
>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>
>> Ugh, what is the use-case for keeping things locked here? How would a 
>> caller tell the difference between an error where everything is 
>> locked and nothing is locked? IMO, we should unlock on error here. If 
>> there indeed is a use-case we should add a separate function for 
>> reserving fences for all locked objects, rather than returning 
>> sometimes locked on error sometime not.
>
> We return the object locked here because it was to much churn to 
> remove it again from the array and we are getting fully cleaned up at 
> the end anyway.

OK, so if we add an unlock functionality, we could just have a 
consistent locking state on error return?

Thanks,
Thomas

>
> Regards,
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>> +
>>> +error_unlock:
>>> +    dma_resv_unlock(obj->resv);
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>> +
>>> +/**
>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>> + * @exec: the drm_exec object with the state
>>> + * @objects: array of GEM object to prepare
>>> + * @num_objects: number of GEM objects in the array
>>> + * @num_fences: number of fences to reserve on each GEM object
>>> + *
>>> + * Prepares all GEM objects in an array, handles contention but 
>>> aports on first
>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>> locking it.
>>> + *
>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>> memory
>>> + * allocation failed and zero for success.
>>> + */
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> +               struct drm_gem_object **objects,
>>> +               unsigned int num_objects,
>>> +               unsigned int num_fences)
>>> +{
>>> +    int ret;
>>> +
>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>> +        if (ret)
>>> +            return ret;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>> +
>>> +MODULE_DESCRIPTION("DRM execution context");
>>> +MODULE_LICENSE("Dual MIT/GPL");
>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>> new file mode 100644
>>> index 000000000000..b1a5da0509c1
>>> --- /dev/null
>>> +++ b/include/drm/drm_exec.h
>>> @@ -0,0 +1,130 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>> +
>>> +#ifndef __DRM_EXEC_H__
>>> +#define __DRM_EXEC_H__
>>> +
>>> +#include <linux/ww_mutex.h>
>>> +
>>> +struct drm_gem_object;
>>> +
>>> +/**
>>> + * enum drm_exec_flags - Execution context flags
>>> + */
>>> +enum drm_exec_flags {
>>> +    /**
>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>> interruptible locking
>>> +     * functions.
>>> +     */
>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>> +
>>> +    /**
>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>> EALREADY errors.
>>> +     */
>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>> +};
>>> +
>>> +/**
>>> + * struct drm_exec - Execution context
>>> + */
>>> +struct drm_exec {
>>> +    /**
>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>> +     */
>>> +    u32 flags;
>>> +
>>> +    /**
>>> +     * @ticket: WW ticket used for acquiring locks
>>> +     */
>>> +    struct ww_acquire_ctx    ticket;
>>> +
>>> +    /**
>>> +     * @num_objects: number of objects locked
>>> +     */
>>> +    unsigned int        num_objects;
>>> +
>>> +    /**
>>> +     * @max_objects: maximum objects in array
>>> +     */
>>> +    unsigned int        max_objects;
>>> +
>>> +    /**
>>> +     * @objects: array of the locked objects
>>> +     */
>>> +    struct drm_gem_object    **objects;
>>> +
>>> +    /**
>>> +     * @contended: contended GEM object we backed off for
>>> +     */
>>> +    struct drm_gem_object    *contended;
>>> +
>>> +    /**
>>> +     * @prelocked: already locked GEM object due to contention
>>> +     */
>>> +    struct drm_gem_object *prelocked;
>>> +};
>>> +
>>> +/**
>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>> objects
>>> + * @exec: drm_exec object
>>> + * @index: unsigned long index for the iteration
>>> + * @obj: the current GEM object
>>> + *
>>> + * Iterate over all the locked GEM objects inside the drm_exec object.
>>> + */
>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>> +         index < (exec)->num_objects;            \
>>> +         ++index, obj = (exec)->objects[index])
>>> +
>>> +/**
>>> + * drm_exec_until_all_locked - retry objects preparation until all 
>>> objects
>>> + * are locked
>>> + * @exec: drm_exec object
>>> + * @expr: expression to be evaluated on each attempt
>>> + *
>>> + * This helper tries to prepare objects and if a deadlock is detected,
>>> + * rollbacks and retries.
>>> + *
>>> + * @expr is typically a function that tries to prepare objects using
>>> + * drm_exec_prepare_obj().
>>> + *
>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>> + *
>>> + *    ret = drm_exec_until_all_locked(exec,
>>> + *                    drm_exec_prepare_array(exec,
>>> + *                                   objs,
>>> + *                                   num_objs,
>>> + *                                   num_fences));
>>> + *    if (ret)
>>> + *        goto error_path;
>>> + *
>>> + *    ...
>>> + *
>>> + * Returns: 0 on success, a negative error code on failure.
>>> + */
>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>> +    ({                        \
>>> +        __label__ retry;            \
>>> +        int __ret;                \
>>> +retry:                            \
>>> +        __ret = expr;                \
>>> +        if ((exec)->contended) {        \
>>> +            WARN_ON(__ret != -EDEADLK);    \
>>> +            drm_exec_reset(exec);        \
>>> +            goto retry;            \
>>> +        }                    \
>>> +        ww_acquire_done(&(exec)->ticket);    \
>>> +        __ret;                    \
>>> +    })
>>> +
>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>> +void drm_exec_fini(struct drm_exec *exec);
>>> +void drm_exec_reset(struct drm_exec *exec);
>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>> drm_gem_object *obj,
>>> +             unsigned int num_fences);
>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>> +               struct drm_gem_object **objects,
>>> +               unsigned int num_objects,
>>> +               unsigned int num_fences);
>>> +
>>> +#endif

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  9:33               ` Thomas Hellström (Intel)
@ 2023-06-19  9:48                 ` Christian König
  2023-06-19 11:06                   ` Thomas Hellström (Intel)
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-06-19  9:48 UTC (permalink / raw)
  To: Thomas Hellström (Intel), Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel

Hi,

Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
> [SNIP]
>>> Sometimes you want to just drop the contended lock after the above 
>>> relaxation. (Eviction would be one example), and not add as 
>>> prelocked, if the contended object goes out of scope. Eviction would 
>>> in some situations be one such example, -EDEADLOCK leading to an 
>>> error path where the object should otherwise be freed is another. 
>>> Perhaps we could add an argument to prepare_obj() as to whether the 
>>> object should be immediately put after relaxation.
>>
>> I was considering a try_prepare version as well, that should cover 
>> this use case.
>
> That sounds a bit different from this use-case. The use-case above 
> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
> contending lock and then immediately unlock it and drop.

Hui? What would that be good for?

> It sounds like try_prepare would just skip locking and continue with 
> everything locked so far still locked?

Correct.

>
>>
>>>
>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>> +    if (unlikely(ret)) {
>>>> +        dma_resv_unlock(obj->resv);
>>>> +        goto error_dropref;
>>>> +    }
>>>> +
>>>> +    swap(exec->prelocked, obj);
>>>> +
>>>> +error_dropref:
>>>> +    /* Always cleanup the contention so that error handling can 
>>>> kick in */
>>>> +    drm_gem_object_put(obj);
>>>> +    exec->contended = NULL;
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>> + * @exec: the drm_exec object with the state
>>>> + * @obj: the GEM object to prepare
>>>> + * @num_fences: how many fences to reserve
>>>> + *
>>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>>> slots. All
>>>> + * successfully locked objects are put into the locked container.
>>>> + *
>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>> object is
>>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>>> for success.
>>>> + */
>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>> drm_gem_object *obj,
>>>> +             unsigned int num_fences)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    ret = drm_exec_lock_contended(exec);
>>>> +    if (unlikely(ret))
>>>> +        return ret;
>>>> +
>>>> +    if (exec->prelocked == obj) {
>>>> +        drm_gem_object_put(exec->prelocked);
>>>> +        exec->prelocked = NULL;
>>>> +
>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>> +    }
>>>> +
>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>>> +    else
>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>> +
>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>> +        drm_gem_object_get(obj);
>>>> +        exec->contended = obj;
>>>> +        return -EDEADLK;
>>>> +    }
>>>> +
>>>> +    if (unlikely(ret == -EALREADY &&
>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>> +        goto reserve_fences;
>>>> +
>>>> +    if (unlikely(ret))
>>>> +        return ret;
>>>> +
>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>> +    if (ret)
>>>> +        goto error_unlock;
>>>> +
>>>> +reserve_fences:
>>>> +    /* Keep locked when reserving fences fails */
>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>
>>> Ugh, what is the use-case for keeping things locked here? How would 
>>> a caller tell the difference between an error where everything is 
>>> locked and nothing is locked? IMO, we should unlock on error here. 
>>> If there indeed is a use-case we should add a separate function for 
>>> reserving fences for all locked objects, rather than returning 
>>> sometimes locked on error sometime not.
>>
>> We return the object locked here because it was to much churn to 
>> remove it again from the array and we are getting fully cleaned up at 
>> the end anyway.
>
> OK, so if we add an unlock functionality, we could just have a 
> consistent locking state on error return?

Yeah, that should work. Going to work on this.

Regards,
Christian.

>
> Thanks,
> Thomas
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>> +
>>>> +error_unlock:
>>>> +    dma_resv_unlock(obj->resv);
>>>> +    return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>> +
>>>> +/**
>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>> + * @exec: the drm_exec object with the state
>>>> + * @objects: array of GEM object to prepare
>>>> + * @num_objects: number of GEM objects in the array
>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>> + *
>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>> aports on first
>>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>>> locking it.
>>>> + *
>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>>> memory
>>>> + * allocation failed and zero for success.
>>>> + */
>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>> +               struct drm_gem_object **objects,
>>>> +               unsigned int num_objects,
>>>> +               unsigned int num_fences)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>> +        if (ret)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>> +
>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>> new file mode 100644
>>>> index 000000000000..b1a5da0509c1
>>>> --- /dev/null
>>>> +++ b/include/drm/drm_exec.h
>>>> @@ -0,0 +1,130 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>> +
>>>> +#ifndef __DRM_EXEC_H__
>>>> +#define __DRM_EXEC_H__
>>>> +
>>>> +#include <linux/ww_mutex.h>
>>>> +
>>>> +struct drm_gem_object;
>>>> +
>>>> +/**
>>>> + * enum drm_exec_flags - Execution context flags
>>>> + */
>>>> +enum drm_exec_flags {
>>>> +    /**
>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>> interruptible locking
>>>> +     * functions.
>>>> +     */
>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>> +
>>>> +    /**
>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>> EALREADY errors.
>>>> +     */
>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct drm_exec - Execution context
>>>> + */
>>>> +struct drm_exec {
>>>> +    /**
>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>> +     */
>>>> +    u32 flags;
>>>> +
>>>> +    /**
>>>> +     * @ticket: WW ticket used for acquiring locks
>>>> +     */
>>>> +    struct ww_acquire_ctx    ticket;
>>>> +
>>>> +    /**
>>>> +     * @num_objects: number of objects locked
>>>> +     */
>>>> +    unsigned int        num_objects;
>>>> +
>>>> +    /**
>>>> +     * @max_objects: maximum objects in array
>>>> +     */
>>>> +    unsigned int        max_objects;
>>>> +
>>>> +    /**
>>>> +     * @objects: array of the locked objects
>>>> +     */
>>>> +    struct drm_gem_object    **objects;
>>>> +
>>>> +    /**
>>>> +     * @contended: contended GEM object we backed off for
>>>> +     */
>>>> +    struct drm_gem_object    *contended;
>>>> +
>>>> +    /**
>>>> +     * @prelocked: already locked GEM object due to contention
>>>> +     */
>>>> +    struct drm_gem_object *prelocked;
>>>> +};
>>>> +
>>>> +/**
>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>> objects
>>>> + * @exec: drm_exec object
>>>> + * @index: unsigned long index for the iteration
>>>> + * @obj: the current GEM object
>>>> + *
>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>> object.
>>>> + */
>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>>> +         index < (exec)->num_objects;            \
>>>> +         ++index, obj = (exec)->objects[index])
>>>> +
>>>> +/**
>>>> + * drm_exec_until_all_locked - retry objects preparation until all 
>>>> objects
>>>> + * are locked
>>>> + * @exec: drm_exec object
>>>> + * @expr: expression to be evaluated on each attempt
>>>> + *
>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>> detected,
>>>> + * rollbacks and retries.
>>>> + *
>>>> + * @expr is typically a function that tries to prepare objects using
>>>> + * drm_exec_prepare_obj().
>>>> + *
>>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>>> + *
>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>> + *                    drm_exec_prepare_array(exec,
>>>> + *                                   objs,
>>>> + *                                   num_objs,
>>>> + *                                   num_fences));
>>>> + *    if (ret)
>>>> + *        goto error_path;
>>>> + *
>>>> + *    ...
>>>> + *
>>>> + * Returns: 0 on success, a negative error code on failure.
>>>> + */
>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>> +    ({                        \
>>>> +        __label__ retry;            \
>>>> +        int __ret;                \
>>>> +retry:                            \
>>>> +        __ret = expr;                \
>>>> +        if ((exec)->contended) {        \
>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>> +            drm_exec_reset(exec);        \
>>>> +            goto retry;            \
>>>> +        }                    \
>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>> +        __ret;                    \
>>>> +    })
>>>> +
>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>> drm_gem_object *obj,
>>>> +             unsigned int num_fences);
>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>> +               struct drm_gem_object **objects,
>>>> +               unsigned int num_objects,
>>>> +               unsigned int num_fences);
>>>> +
>>>> +#endif


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  8:59           ` Thomas Hellström (Intel)
  2023-06-19  9:20             ` Christian König
@ 2023-06-19 10:12             ` Boris Brezillon
  2023-06-19 10:44               ` Christian König
  1 sibling, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-19 10:12 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	Christian König

Hello Thomas,

On Mon, 19 Jun 2023 10:59:16 +0200
Thomas Hellström (Intel) <thomas_os@shipmail.org> wrote:

> >>>>       
> >>>>> +/**
> >>>>> + * DOC: Overview
> >>>>> + *
> >>>>> + * This component mainly abstracts the retry loop necessary for locking
> >>>>> + * multiple GEM objects while preparing hardware operations (e.g. command
> >>>>> + * submissions, page table updates etc..).
> >>>>> + *
> >>>>> + * If a contention is detected while locking a GEM object the cleanup procedure
> >>>>> + * unlocks all previously locked GEM objects and locks the contended one first
> >>>>> + * before locking any further objects.
> >>>>> + *
> >>>>> + * After an object is locked fences slots can optionally be reserved on the
> >>>>> + * dma_resv object inside the GEM object.
> >>>>> + *
> >>>>> + * A typical usage pattern should look like this::
> >>>>> + *
> >>>>> + *	struct drm_gem_object *obj;
> >>>>> + *	struct drm_exec exec;
> >>>>> + *	unsigned long index;
> >>>>> + *	int ret;
> >>>>> + *
> >>>>> + *	drm_exec_init(&exec, true);
> >>>>> + *	drm_exec_while_not_all_locked(&exec) {
> >>>>> + *		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >>>>> + *		drm_exec_continue_on_contention(&exec);
> >>>>> + *		if (ret)
> >>>>> + *			goto error;
> >>>>> + *  
> >>>> Have you considered defining a drm_exec_try_prepare_obj_or_retry()
> >>>> combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?
> >>>>
> >>>> #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
> >>>>           ({ \
> >>>>                   int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
> >>>>                   if (unlikely(drm_exec_is_contended(exec))) \
> >>>>                           continue; \
> >>>>                   __ret; \
> >>>>           })
> >>>>
> >>>> This way the following pattern
> >>>>
> >>>> 		ret = drm_exec_prepare_obj(&exec, boA, 1);
> >>>> 		drm_exec_continue_on_contention(&exec);
> >>>> 		if (ret)
> >>>> 			goto error;
> >>>>
> >>>> can be turned into something more conventional:
> >>>>
> >>>> 		ret = drm_exec_try_prepare_obj_or_retry(&exec, boA, 1);
> >>>> 		if (ret)
> >>>> 			goto error;  
> >>> Yeah, I was considering that as well. But then abandoned it as to
> >>> complicated.
> >>>
> >>> I really need to find some time to work on that anyway.  
> > I've been playing with drm_exec for a couple weeks now, and I wanted
> > to share something I hacked to try and make the API simpler and
> > more robust against misuse (see the below diff, which is a slightly
> > adjusted version of your work).  
> 
> It would be good if we could have someone taking charge of this series 
> and address all review comments, I see some of my comments getting lost, 
> we have multiple submitters and I can't find a dri-devel patchwork entry 
> for this.

My bad, I wasn't intending to submit a new version. I just added a
diff to show what I had in mind. This being said, it'd be great if we
could make some progress on this series, because we have quite a few
drivers depending on it now.

> 
> >
> > In this version, the user is no longer in control of the retry
> > loop. Instead, it provides an expression (a call to a
> > sub-function) to be re-evaluated each time a contention is
> > detected. IMHO, this makes the 'prepare-objs' functions easier to
> > apprehend, and avoids any mistake like calling
> > drm_exec_continue_on_contention() in an inner loop, or breaking
> > out of the drm_exec_while_all_locked() loop unintentionally.  
> 
> In i915 we've had a very similar helper to this, and while I agree this 
> newer version would probably help make code cleaner, but OTOH there also 
> are some places where the short drm_exec_while_all_locked() -likeblock 
> don't really motivate a separate function. Porting i915 to the current 
> version will take some work, For  the xe driver both versions would work 
> fine.

Note that the drm_exec_until_all_locked() helper I introduced is taking
an expression, so in theory, you don't have to define a separate
function.

	drm_exec_until_all_locked(&exec, {
		/* inlined-code */
		int ret;

		ret = blabla()
		if (ret)
			goto error;

		...

error:
		/* return value. */
		ret;
	});

This being said, as soon as you have several failure paths,
it makes things a lot easier/controllable if you make it a function,
and I honestly don't think the readability would suffer from having a
function defined just above the user. My main concern with the original
approach was the risk of calling continue/break_if_contended() in the
wrong place, and also the fact you can't really externalize things to
a function if you're looking for a cleaner split. At least with
drm_exec_until_all_locked() you can do both.

Regards,

Boris

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  9:20             ` Christian König
  2023-06-19  9:33               ` Thomas Hellström (Intel)
@ 2023-06-19 10:23               ` Boris Brezillon
  1 sibling, 0 replies; 50+ messages in thread
From: Boris Brezillon @ 2023-06-19 10:23 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

On Mon, 19 Jun 2023 11:20:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Hi guys,
> 
> Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
> > [SNIP]  
> >>>>
> >>>> I really need to find some time to work on that anyway.  
> >> I've been playing with drm_exec for a couple weeks now, and I wanted
> >> to share something I hacked to try and make the API simpler and
> >> more robust against misuse (see the below diff, which is a slightly
> >> adjusted version of your work).  
> >
> > It would be good if we could have someone taking charge of this series 
> > and address all review comments, I see some of my comments getting 
> > lost, we have multiple submitters and I can't find a dri-devel 
> > patchwork entry for this. Anyway some comments below.  
> 
> I can try to find some time for the series this week (As long as nobody 
> comes along and has any burning roof).

That's great news!

> 
> >  
> >>
> >> In this version, the user is no longer in control of the retry
> >> loop. Instead, it provides an expression (a call to a
> >> sub-function) to be re-evaluated each time a contention is
> >> detected. IMHO, this makes the 'prepare-objs' functions easier to
> >> apprehend, and avoids any mistake like calling
> >> drm_exec_continue_on_contention() in an inner loop, or breaking
> >> out of the drm_exec_while_all_locked() loop unintentionally.  
> >
> > In i915 we've had a very similar helper to this, and while I agree 
> > this newer version would probably help make code cleaner, but OTOH 
> > there also are some places where the short drm_exec_while_all_locked() 
> > -likeblock don't really motivate a separate function. Porting i915 to 
> > the current version will take some work, For  the xe driver both 
> > versions would work fine.  
> 
> Yeah, this is actually what my first version of this looked like. But I 
> abandoned that approach because we have a lot of cases were we just 
> quickly want to lock a few GEM objects and don't want the extra overhead 
> of putting all the state into some bag to forward it to a function.

If you're talking about verbosity, it might be the case, though I guess
it mostly a matter of taste (I do like when things are well isolated).
As for runtime overhead, I'd expect the compiler to inline the function
anyway, so it's unlikely to change anything.

> >> +/* Track the locked object in the array */
> >> +static int drm_exec_obj_locked(struct drm_exec *exec,
> >> +                   struct drm_gem_object *obj)
> >> +{
> >> +    if (unlikely(exec->num_objects == exec->max_objects)) {
> >> +        size_t size = exec->max_objects * sizeof(void *);
> >> +        void *tmp;
> >> +
> >> +        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
> >> +                GFP_KERNEL);
> >> +        if (!tmp)
> >> +            return -ENOMEM;  
> >
> > Sometimes you need to just temporarily lock an object and then unlock 
> > it again if it goes out of scope before reaching the end of 
> > _until_all_locked(). In that case you might need to remove a lock from 
> > the array. I *think* for all use-cases in i915 it would suffice to 
> > take a snapshot of num_objects, and unlock everything above that, 
> > having exec->objects behave like a stack, but was ever a list 
> > considered instead of a realloced array?  
> 
> Yes, the problem is that linked lists really suck regarding their cache 
> line locality. That's why I've came up with this approach here.

Hm, maybe I'm missing something, but if you place the list_head obj you
use to stack the locked objects close enough to the resv pointer, and
aligned on cache line, it shouldn't really be a problem, given you have
to dereference the GEM object to retrieve its resv anyway.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 10:12             ` Boris Brezillon
@ 2023-06-19 10:44               ` Christian König
  2023-06-19 11:05                 ` Boris Brezillon
  2023-06-19 12:29                 ` Boris Brezillon
  0 siblings, 2 replies; 50+ messages in thread
From: Christian König @ 2023-06-19 10:44 UTC (permalink / raw)
  To: Boris Brezillon, Thomas Hellström (Intel)
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel

Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> [SNIP]
> Note that the drm_exec_until_all_locked() helper I introduced is taking
> an expression, so in theory, you don't have to define a separate
> function.
>
> 	drm_exec_until_all_locked(&exec, {
> 		/* inlined-code */
> 		int ret;
>
> 		ret = blabla()
> 		if (ret)
> 			goto error;
>
> 		...
>
> error:
> 		/* return value. */
> 		ret;
> 	});
>
> This being said, as soon as you have several failure paths,
> it makes things a lot easier/controllable if you make it a function,
> and I honestly don't think the readability would suffer from having a
> function defined just above the user. My main concern with the original
> approach was the risk of calling continue/break_if_contended() in the
> wrong place, and also the fact you can't really externalize things to
> a function if you're looking for a cleaner split. At least with
> drm_exec_until_all_locked() you can do both.

Yeah, but that means that you can't use return inside your code block 
and instead has to define an error label for handling "normal" 
contention which is what I'm trying to avoid here.

How about:

#define drm_exec_until_all_locked(exec)    \
         __drm_exec_retry: if (drm_exec_cleanup(exec))


#define drm_exec_retry_on_contention(exec)              \
         if (unlikely(drm_exec_is_contended(exec)))      \
                 goto __drm_exec_retry


And then use it like:

drm_exec_until_all_locked(exec)
{
     ret = drm_exec_prepare_obj(exec, obj);
     drm_exec_retry_on_contention(exec);
}

The only problem I can see with this is that the __drm_exec_retry label 
would be function local.

Regards,
Christian.

>
> Regards,
>
> Boris


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 10:44               ` Christian König
@ 2023-06-19 11:05                 ` Boris Brezillon
  2023-06-19 12:01                   ` Boris Brezillon
  2023-06-19 12:29                 ` Boris Brezillon
  1 sibling, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-19 11:05 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

On Mon, 19 Jun 2023 12:44:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> > [SNIP]
> > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > an expression, so in theory, you don't have to define a separate
> > function.
> >
> > 	drm_exec_until_all_locked(&exec, {
> > 		/* inlined-code */
> > 		int ret;
> >
> > 		ret = blabla()
> > 		if (ret)
> > 			goto error;
> >
> > 		...
> >
> > error:
> > 		/* return value. */
> > 		ret;
> > 	});
> >
> > This being said, as soon as you have several failure paths,
> > it makes things a lot easier/controllable if you make it a function,
> > and I honestly don't think the readability would suffer from having a
> > function defined just above the user. My main concern with the original
> > approach was the risk of calling continue/break_if_contended() in the
> > wrong place, and also the fact you can't really externalize things to
> > a function if you're looking for a cleaner split. At least with
> > drm_exec_until_all_locked() you can do both.  
> 
> Yeah, but that means that you can't use return inside your code block 
> and instead has to define an error label for handling "normal" 
> contention which is what I'm trying to avoid here.
> 
> How about:
> 
> #define drm_exec_until_all_locked(exec)    \
>          __drm_exec_retry: if (drm_exec_cleanup(exec))
> 
> 
> #define drm_exec_retry_on_contention(exec)              \
>          if (unlikely(drm_exec_is_contended(exec)))      \
>                  goto __drm_exec_retry
> 
> 
> And then use it like:
> 
> drm_exec_until_all_locked(exec)
> {
>      ret = drm_exec_prepare_obj(exec, obj);
>      drm_exec_retry_on_contention(exec);
> }

That would work, and I was about to suggest extending my proposal with
a drm_exec_retry_on_contention() to support both use cases. The only
downside is the fact you might be able to break out of a loop that has
local variables, which will leak stack space.

> 
> The only problem I can see with this is that the __drm_exec_retry label 
> would be function local.

You can use local labels [1] to make it local to a block (see my
version, just need to rename the retry label into __drm_exec_retry). I
checked, and this is used elsewhere in the kernel (like in
linux/wait.h, which is a core feature), so it should be safe to use.

[1]https://gcc.gnu.org/onlinedocs/gcc/Local-Labels.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19  9:48                 ` Christian König
@ 2023-06-19 11:06                   ` Thomas Hellström (Intel)
  2023-06-21 13:35                     ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Hellström (Intel) @ 2023-06-19 11:06 UTC (permalink / raw)
  To: Christian König, Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel


On 6/19/23 11:48, Christian König wrote:
> Hi,
>
> Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>> Sometimes you want to just drop the contended lock after the above 
>>>> relaxation. (Eviction would be one example), and not add as 
>>>> prelocked, if the contended object goes out of scope. Eviction 
>>>> would in some situations be one such example, -EDEADLOCK leading to 
>>>> an error path where the object should otherwise be freed is 
>>>> another. Perhaps we could add an argument to prepare_obj() as to 
>>>> whether the object should be immediately put after relaxation.
>>>
>>> I was considering a try_prepare version as well, that should cover 
>>> this use case.
>>
>> That sounds a bit different from this use-case. The use-case above 
>> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
>> contending lock and then immediately unlock it and drop.
>
> Hui? What would that be good for?

It's for the case where you have nested locking, the contended lock has 
gone out-of-scope and you're probably not going to need it on the next 
attempt. I think the refcounted "prelocked" object that is lacking in 
the i915 variant will resolve all correctness / uaf issues, but still 
the object might be needlessly carried around for yet another locking round.


>
>> It sounds like try_prepare would just skip locking and continue with 
>> everything locked so far still locked?
>
> Correct.
>
>>
>>>
>>>>
>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>> +    if (unlikely(ret)) {
>>>>> +        dma_resv_unlock(obj->resv);
>>>>> +        goto error_dropref;
>>>>> +    }
>>>>> +
>>>>> +    swap(exec->prelocked, obj);
>>>>> +
>>>>> +error_dropref:
>>>>> +    /* Always cleanup the contention so that error handling can 
>>>>> kick in */
>>>>> +    drm_gem_object_put(obj);
>>>>> +    exec->contended = NULL;
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>>> + * @exec: the drm_exec object with the state
>>>>> + * @obj: the GEM object to prepare
>>>>> + * @num_fences: how many fences to reserve
>>>>> + *
>>>>> + * Prepare a GEM object for use by locking it and reserving fence 
>>>>> slots. All
>>>>> + * successfully locked objects are put into the locked container.
>>>>> + *
>>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>>> object is
>>>>> + * already locked, -ENOMEM when memory allocation failed and zero 
>>>>> for success.
>>>>> + */
>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>> drm_gem_object *obj,
>>>>> +             unsigned int num_fences)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = drm_exec_lock_contended(exec);
>>>>> +    if (unlikely(ret))
>>>>> +        return ret;
>>>>> +
>>>>> +    if (exec->prelocked == obj) {
>>>>> +        drm_gem_object_put(exec->prelocked);
>>>>> +        exec->prelocked = NULL;
>>>>> +
>>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>> +    }
>>>>> +
>>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>>> +        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
>>>>> +    else
>>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>>> +
>>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>>> +        drm_gem_object_get(obj);
>>>>> +        exec->contended = obj;
>>>>> +        return -EDEADLK;
>>>>> +    }
>>>>> +
>>>>> +    if (unlikely(ret == -EALREADY &&
>>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>>> +        goto reserve_fences;
>>>>> +
>>>>> +    if (unlikely(ret))
>>>>> +        return ret;
>>>>> +
>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>> +    if (ret)
>>>>> +        goto error_unlock;
>>>>> +
>>>>> +reserve_fences:
>>>>> +    /* Keep locked when reserving fences fails */
>>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>
>>>> Ugh, what is the use-case for keeping things locked here? How would 
>>>> a caller tell the difference between an error where everything is 
>>>> locked and nothing is locked? IMO, we should unlock on error here. 
>>>> If there indeed is a use-case we should add a separate function for 
>>>> reserving fences for all locked objects, rather than returning 
>>>> sometimes locked on error sometime not.
>>>
>>> We return the object locked here because it was to much churn to 
>>> remove it again from the array and we are getting fully cleaned up 
>>> at the end anyway.
>>
>> OK, so if we add an unlock functionality, we could just have a 
>> consistent locking state on error return?
>
> Yeah, that should work. Going to work on this.

Great.

Thanks,

Thomas


>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Thomas
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Thomas
>>>>
>>>>
>>>>> +
>>>>> +error_unlock:
>>>>> +    dma_resv_unlock(obj->resv);
>>>>> +    return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>>> + * @exec: the drm_exec object with the state
>>>>> + * @objects: array of GEM object to prepare
>>>>> + * @num_objects: number of GEM objects in the array
>>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>>> + *
>>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>>> aports on first
>>>>> + * error otherwise. Reserves @num_fences on each GEM object after 
>>>>> locking it.
>>>>> + *
>>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM when 
>>>>> memory
>>>>> + * allocation failed and zero for success.
>>>>> + */
>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>> +               struct drm_gem_object **objects,
>>>>> +               unsigned int num_objects,
>>>>> +               unsigned int num_fences)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>>> +        if (ret)
>>>>> +            return ret;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>>> +
>>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>>> new file mode 100644
>>>>> index 000000000000..b1a5da0509c1
>>>>> --- /dev/null
>>>>> +++ b/include/drm/drm_exec.h
>>>>> @@ -0,0 +1,130 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>>> +
>>>>> +#ifndef __DRM_EXEC_H__
>>>>> +#define __DRM_EXEC_H__
>>>>> +
>>>>> +#include <linux/ww_mutex.h>
>>>>> +
>>>>> +struct drm_gem_object;
>>>>> +
>>>>> +/**
>>>>> + * enum drm_exec_flags - Execution context flags
>>>>> + */
>>>>> +enum drm_exec_flags {
>>>>> +    /**
>>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>>> interruptible locking
>>>>> +     * functions.
>>>>> +     */
>>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>>> +
>>>>> +    /**
>>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>>> EALREADY errors.
>>>>> +     */
>>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct drm_exec - Execution context
>>>>> + */
>>>>> +struct drm_exec {
>>>>> +    /**
>>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>>> +     */
>>>>> +    u32 flags;
>>>>> +
>>>>> +    /**
>>>>> +     * @ticket: WW ticket used for acquiring locks
>>>>> +     */
>>>>> +    struct ww_acquire_ctx    ticket;
>>>>> +
>>>>> +    /**
>>>>> +     * @num_objects: number of objects locked
>>>>> +     */
>>>>> +    unsigned int        num_objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @max_objects: maximum objects in array
>>>>> +     */
>>>>> +    unsigned int        max_objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @objects: array of the locked objects
>>>>> +     */
>>>>> +    struct drm_gem_object    **objects;
>>>>> +
>>>>> +    /**
>>>>> +     * @contended: contended GEM object we backed off for
>>>>> +     */
>>>>> +    struct drm_gem_object    *contended;
>>>>> +
>>>>> +    /**
>>>>> +     * @prelocked: already locked GEM object due to contention
>>>>> +     */
>>>>> +    struct drm_gem_object *prelocked;
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>>> objects
>>>>> + * @exec: drm_exec object
>>>>> + * @index: unsigned long index for the iteration
>>>>> + * @obj: the current GEM object
>>>>> + *
>>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>>> object.
>>>>> + */
>>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>>> +    for (index = 0, obj = (exec)->objects[0];        \
>>>>> +         index < (exec)->num_objects;            \
>>>>> +         ++index, obj = (exec)->objects[index])
>>>>> +
>>>>> +/**
>>>>> + * drm_exec_until_all_locked - retry objects preparation until 
>>>>> all objects
>>>>> + * are locked
>>>>> + * @exec: drm_exec object
>>>>> + * @expr: expression to be evaluated on each attempt
>>>>> + *
>>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>>> detected,
>>>>> + * rollbacks and retries.
>>>>> + *
>>>>> + * @expr is typically a function that tries to prepare objects using
>>>>> + * drm_exec_prepare_obj().
>>>>> + *
>>>>> + * If we take drm_exec_prepare_array() as an example, you should do:
>>>>> + *
>>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>>> + *                    drm_exec_prepare_array(exec,
>>>>> + *                                   objs,
>>>>> + *                                   num_objs,
>>>>> + *                                   num_fences));
>>>>> + *    if (ret)
>>>>> + *        goto error_path;
>>>>> + *
>>>>> + *    ...
>>>>> + *
>>>>> + * Returns: 0 on success, a negative error code on failure.
>>>>> + */
>>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>>> +    ({                        \
>>>>> +        __label__ retry;            \
>>>>> +        int __ret;                \
>>>>> +retry:                            \
>>>>> +        __ret = expr;                \
>>>>> +        if ((exec)->contended) {        \
>>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>>> +            drm_exec_reset(exec);        \
>>>>> +            goto retry;            \
>>>>> +        }                    \
>>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>>> +        __ret;                    \
>>>>> +    })
>>>>> +
>>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>> drm_gem_object *obj,
>>>>> +             unsigned int num_fences);
>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>> +               struct drm_gem_object **objects,
>>>>> +               unsigned int num_objects,
>>>>> +               unsigned int num_fences);
>>>>> +
>>>>> +#endif

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 11:05                 ` Boris Brezillon
@ 2023-06-19 12:01                   ` Boris Brezillon
  0 siblings, 0 replies; 50+ messages in thread
From: Boris Brezillon @ 2023-06-19 12:01 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

On Mon, 19 Jun 2023 13:05:02 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Mon, 19 Jun 2023 12:44:06 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 19.06.23 um 12:12 schrieb Boris Brezillon:  
> > > [SNIP]
> > > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > > an expression, so in theory, you don't have to define a separate
> > > function.
> > >
> > > 	drm_exec_until_all_locked(&exec, {
> > > 		/* inlined-code */
> > > 		int ret;
> > >
> > > 		ret = blabla()
> > > 		if (ret)
> > > 			goto error;
> > >
> > > 		...
> > >
> > > error:
> > > 		/* return value. */
> > > 		ret;
> > > 	});
> > >
> > > This being said, as soon as you have several failure paths,
> > > it makes things a lot easier/controllable if you make it a function,
> > > and I honestly don't think the readability would suffer from having a
> > > function defined just above the user. My main concern with the original
> > > approach was the risk of calling continue/break_if_contended() in the
> > > wrong place, and also the fact you can't really externalize things to
> > > a function if you're looking for a cleaner split. At least with
> > > drm_exec_until_all_locked() you can do both.    
> > 
> > Yeah, but that means that you can't use return inside your code block 
> > and instead has to define an error label for handling "normal" 
> > contention which is what I'm trying to avoid here.
> > 
> > How about:
> > 
> > #define drm_exec_until_all_locked(exec)    \
> >          __drm_exec_retry: if (drm_exec_cleanup(exec))
> > 
> > 
> > #define drm_exec_retry_on_contention(exec)              \
> >          if (unlikely(drm_exec_is_contended(exec)))      \
> >                  goto __drm_exec_retry
> > 
> > 
> > And then use it like:
> > 
> > drm_exec_until_all_locked(exec)
> > {
> >      ret = drm_exec_prepare_obj(exec, obj);
> >      drm_exec_retry_on_contention(exec);
> > }  
> 
> That would work, and I was about to suggest extending my proposal with
> a drm_exec_retry_on_contention() to support both use cases. The only
> downside is the fact you might be able to break out of a loop that has
> local variables, which will leak stack space.

Nevermind, brain fart on my end. It shouldn't leak any stack space, so
yeah, I think that's a good compromise.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 10:44               ` Christian König
  2023-06-19 11:05                 ` Boris Brezillon
@ 2023-06-19 12:29                 ` Boris Brezillon
  2023-06-20  6:47                   ` Boris Brezillon
  1 sibling, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-19 12:29 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

On Mon, 19 Jun 2023 12:44:06 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
> > [SNIP]
> > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > an expression, so in theory, you don't have to define a separate
> > function.
> >
> > 	drm_exec_until_all_locked(&exec, {
> > 		/* inlined-code */
> > 		int ret;
> >
> > 		ret = blabla()
> > 		if (ret)
> > 			goto error;
> >
> > 		...
> >
> > error:
> > 		/* return value. */
> > 		ret;
> > 	});
> >
> > This being said, as soon as you have several failure paths,
> > it makes things a lot easier/controllable if you make it a function,
> > and I honestly don't think the readability would suffer from having a
> > function defined just above the user. My main concern with the original
> > approach was the risk of calling continue/break_if_contended() in the
> > wrong place, and also the fact you can't really externalize things to
> > a function if you're looking for a cleaner split. At least with
> > drm_exec_until_all_locked() you can do both.  
> 
> Yeah, but that means that you can't use return inside your code block 
> and instead has to define an error label for handling "normal" 
> contention which is what I'm trying to avoid here.

Sorry, didn't pay attention to this particular concern. Indeed, if you
want to return inside the expression, that's a problem.

> 
> How about:
> 
> #define drm_exec_until_all_locked(exec)    \
>          __drm_exec_retry: if (drm_exec_cleanup(exec))
> 
> 
> #define drm_exec_retry_on_contention(exec)              \
>          if (unlikely(drm_exec_is_contended(exec)))      \
>                  goto __drm_exec_retry
> 
> 
> And then use it like:
> 
> drm_exec_until_all_locked(exec)
> {
>      ret = drm_exec_prepare_obj(exec, obj);
>      drm_exec_retry_on_contention(exec);
> }
> 
> The only problem I can see with this is that the __drm_exec_retry label 
> would be function local.

Yeah, I'm not sure it's safe to use non-local labels for that, because,
as soon as you have more than one drm_exec_until_all_locked() call in a
given function it won't work, which is why I placed things in a block
with local labels, which in turn means you can't return directly,
unfortunately.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-05-04 11:51 ` [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2 Christian König
  2023-06-12 13:16   ` Tatsuyuki Ishi
@ 2023-06-20  4:07   ` Tatsuyuki Ishi
  2023-06-20  4:14     ` Tatsuyuki Ishi
  2023-06-20  8:12     ` Christian König
  1 sibling, 2 replies; 50+ messages in thread
From: Tatsuyuki Ishi @ 2023-06-20  4:07 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

+Boris and +Matthew in case you want to take over this patch set.

Here are some review / testing comments, including those I posted before 
to ease tracking.

On 5/4/23 20:51, Christian König wrote:
> Use the new component here as well and remove the old handling.
> 
> v2: drop dupplicate handling
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  71 ++-----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 210 +++++++++-----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   7 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
>   7 files changed, 115 insertions(+), 204 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 02b827785e39..eba3e4f01ea6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
>   
>   	list->first_userptr = first_userptr;
>   	list->num_entries = num_entries;
> +	sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
> +	     amdgpu_bo_list_entry_cmp, NULL);

Previously amdgpu_bo_list_get_list sorted all entries, but this one only 
sorts userptr entries. I think this changes behavior?

> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>   		e->user_invalidated = userpage_invalidated;
>   	}
>   
> -	r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
> -				   &duplicates);
> -	if (unlikely(r != 0)) {
> -		if (r != -ERESTARTSYS)
> -			DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
> -		goto out_free_user_pages;
> +	drm_exec_while_not_all_locked(&p->exec) {
> +		r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
> +		drm_exec_continue_on_contention(&p->exec);

Duplicate handling is needed for pretty much every call of 
amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv 
for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.

I think Boris's suggestion of having this through a common 
DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.

> +		if (unlikely(r))
> +			goto out_free_user_pages;
> +
> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);

Previously there were comments for how the fence count is calculated, 
now they seem to be removed. I'd prefer if they were properly retained 
as finding out who calls drm_resv_add_fence is not trivial, and wrong 
reservation count can also be really hard to debug.

Likewise for amdgpu_vm_lock_pd (which was added in another commit).

> +			drm_exec_break_on_contention(&p->exec);
> +			if (unlikely(r))
> +				goto out_free_user_pages;
> +
> +			e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
> +			e->range = NULL;
> +		}
> +		drm_exec_continue_on_contention(&p->exec);
> +
> +		if (p->uf_bo) {
> +			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
> +						 2);
> +			drm_exec_continue_on_contention(&p->exec);
> +			if (unlikely(r))
> +				goto out_free_user_pages;
> +		}
>   	}
>   
> -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
> +		struct mm_struct *usermm;
>   
> -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +		usermm = amdgpu_ttm_tt_get_usermm(e->bo->tbo.ttm);
> +		if (usermm && usermm != current->mm) {
> +			r = -EPERM;
> +			goto out_free_user_pages;
> +		}
> +
> +		if (amdgpu_ttm_tt_is_userptr(e->bo->tbo.ttm) &&
> +		    e->user_invalidated && e->user_pages) {
> +			amdgpu_bo_placement_from_domain(e->bo,
> +							AMDGPU_GEM_DOMAIN_CPU);
> +			r = ttm_bo_validate(&e->bo->tbo, &e->bo->placement,
> +					    &ctx);
> +			if (r)
> +				goto out_free_user_pages;
> +
> +			amdgpu_ttm_tt_set_user_pages(e->bo->tbo.ttm,
> +						     e->user_pages);
> +		}
> +
> +		kvfree(e->user_pages);
> +		e->user_pages = NULL;
>   	}
>   
>   	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
> @@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	 */
>   	r = 0;
>   	amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> -		r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, e->range);
> +		r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
> +							e->range);
>   		e->range = NULL;

e->range = NULL; needs to be removed, or it's causing *massive* memory 
leaks.

>   	}
>   	if (r) {
> @@ -1307,20 +1281,22 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	}
>   
>   	p->fence = dma_fence_get(&leader->base.s_fence->finished);
> -	list_for_each_entry(e, &p->validated, tv.head) {
> +	drm_exec_for_each_locked_object(&p->exec, index, gobj) {
> +
> +		ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
>   
>   		/* Everybody except for the gang leader uses READ */
>   		for (i = 0; i < p->gang_size; ++i) {
>   			if (p->jobs[i] == leader)
>   				continue;
>   
> -			dma_resv_add_fence(e->tv.bo->base.resv,
> +			dma_resv_add_fence(gobj->resv,
>   					   &p->jobs[i]->base.s_fence->finished,
>   					   DMA_RESV_USAGE_READ);
>   		}
>   
> -		/* The gang leader is remembered as writer */
> -		e->tv.num_shared = 0;
> +		/* The gang leader as remembered as writer */
> +		dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
>   	}

Previously PD used READ accesses, now everything is WRITE. This probably 
isn't right.

Regards,
Tatsuyuki

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  4:07   ` Tatsuyuki Ishi
@ 2023-06-20  4:14     ` Tatsuyuki Ishi
  2023-06-20  8:13       ` Christian König
  2023-06-20  8:12     ` Christian König
  1 sibling, 1 reply; 50+ messages in thread
From: Tatsuyuki Ishi @ 2023-06-20  4:14 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

On 6/20/23 13:07, Tatsuyuki Ishi wrote:
>> @@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct 
>> amdgpu_cs_parser *p,
>>        */
>>       r = 0;
>>       amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>> -        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> -
>> -        r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, e->range);
>> +        r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
>> +                            e->range);
>>           e->range = NULL;
> 
> e->range = NULL; needs to be removed, or it's causing *massive* memory 
> leaks.

Actually, I quoted the wrong hunk, the correct one is below.

> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  		e->user_invalidated = userpage_invalidated;
>  	}
>  
> -	r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
> -				   &duplicates);
> -	if (unlikely(r != 0)) {
> -		if (r != -ERESTARTSYS)
> -			DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
> -		goto out_free_user_pages;
> +	drm_exec_while_not_all_locked(&p->exec) {
> +		r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
> +		drm_exec_continue_on_contention(&p->exec);
> +		if (unlikely(r))
> +			goto out_free_user_pages;
> +
> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +			r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);
> +			drm_exec_break_on_contention(&p->exec);
> +			if (unlikely(r))
> +				goto out_free_user_pages;
> +
> +			e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
> +			e->range = NULL;

This causes the leak.

> +		}
> +		drm_exec_continue_on_contention(&p->exec);
> +
> +		if (p->uf_bo) {
> +			r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
> +						 2);
> +			drm_exec_continue_on_contention(&p->exec);
> +			if (unlikely(r))
> +				goto out_free_user_pages;
> +		}
>  	}

Tatsuyuki

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 12:29                 ` Boris Brezillon
@ 2023-06-20  6:47                   ` Boris Brezillon
  2023-06-20  7:28                     ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-20  6:47 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

On Mon, 19 Jun 2023 14:29:23 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Mon, 19 Jun 2023 12:44:06 +0200
> Christian König <christian.koenig@amd.com> wrote:
> 
> > Am 19.06.23 um 12:12 schrieb Boris Brezillon:  
> > > [SNIP]
> > > Note that the drm_exec_until_all_locked() helper I introduced is taking
> > > an expression, so in theory, you don't have to define a separate
> > > function.
> > >
> > > 	drm_exec_until_all_locked(&exec, {
> > > 		/* inlined-code */
> > > 		int ret;
> > >
> > > 		ret = blabla()
> > > 		if (ret)
> > > 			goto error;
> > >
> > > 		...
> > >
> > > error:
> > > 		/* return value. */
> > > 		ret;
> > > 	});
> > >
> > > This being said, as soon as you have several failure paths,
> > > it makes things a lot easier/controllable if you make it a function,
> > > and I honestly don't think the readability would suffer from having a
> > > function defined just above the user. My main concern with the original
> > > approach was the risk of calling continue/break_if_contended() in the
> > > wrong place, and also the fact you can't really externalize things to
> > > a function if you're looking for a cleaner split. At least with
> > > drm_exec_until_all_locked() you can do both.    
> > 
> > Yeah, but that means that you can't use return inside your code block 
> > and instead has to define an error label for handling "normal" 
> > contention which is what I'm trying to avoid here.  
> 
> Sorry, didn't pay attention to this particular concern. Indeed, if you
> want to return inside the expression, that's a problem.

Sorry, that's wrong again. Had trouble focusing yesterday...

So, returning directly from the expression block should be perfectly
fine. The only problem is breaking out of the retry loop early and
propagating the error, but that's no more or less problematic than it
was before. We just need the drm_exec_retry_on_contention() helper you
suggested, and a drm_exec_stop() that would go to some local
__drm_exec_stop label.

	int ret = 0;

	ret = drm_exec_until_all_locked(exec, ({
		...
		ret = drm_exec_prepare_obj(exec, objA, 1);
		drm_exec_retry_on_contention(exec);
		if (ret)
			drm_exec_stop(exec, ret);
		...

		ret = drm_exec_prepare_obj(exec, objB, 1);
		drm_exec_retry_on_contention(exec);
		if (ret)
			drm_exec_stop(exec, ret);

		0;
	}));

Which is pretty close to the syntax you defined initially, except for
the '0;' oddity at the end, which is ugly, I admit.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-20  6:47                   ` Boris Brezillon
@ 2023-06-20  7:28                     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-06-20  7:28 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam,
	Thomas Hellström (Intel),
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel,
	felix.kuehling

Am 20.06.23 um 08:47 schrieb Boris Brezillon:
> On Mon, 19 Jun 2023 14:29:23 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
>> On Mon, 19 Jun 2023 12:44:06 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 19.06.23 um 12:12 schrieb Boris Brezillon:
>>>> [SNIP]
>>>> Note that the drm_exec_until_all_locked() helper I introduced is taking
>>>> an expression, so in theory, you don't have to define a separate
>>>> function.
>>>>
>>>> 	drm_exec_until_all_locked(&exec, {
>>>> 		/* inlined-code */
>>>> 		int ret;
>>>>
>>>> 		ret = blabla()
>>>> 		if (ret)
>>>> 			goto error;
>>>>
>>>> 		...
>>>>
>>>> error:
>>>> 		/* return value. */
>>>> 		ret;
>>>> 	});
>>>>
>>>> This being said, as soon as you have several failure paths,
>>>> it makes things a lot easier/controllable if you make it a function,
>>>> and I honestly don't think the readability would suffer from having a
>>>> function defined just above the user. My main concern with the original
>>>> approach was the risk of calling continue/break_if_contended() in the
>>>> wrong place, and also the fact you can't really externalize things to
>>>> a function if you're looking for a cleaner split. At least with
>>>> drm_exec_until_all_locked() you can do both.
>>> Yeah, but that means that you can't use return inside your code block
>>> and instead has to define an error label for handling "normal"
>>> contention which is what I'm trying to avoid here.
>> Sorry, didn't pay attention to this particular concern. Indeed, if you
>> want to return inside the expression, that's a problem.
> Sorry, that's wrong again. Had trouble focusing yesterday...
>
> So, returning directly from the expression block should be perfectly
> fine. The only problem is breaking out of the retry loop early and
> propagating the error, but that's no more or less problematic than it
> was before. We just need the drm_exec_retry_on_contention() helper you
> suggested, and a drm_exec_stop() that would go to some local
> __drm_exec_stop label.
>
> 	int ret = 0;
>
> 	ret = drm_exec_until_all_locked(exec, ({
> 		...
> 		ret = drm_exec_prepare_obj(exec, objA, 1);
> 		drm_exec_retry_on_contention(exec);
> 		if (ret)
> 			drm_exec_stop(exec, ret);
> 		...
>
> 		ret = drm_exec_prepare_obj(exec, objB, 1);
> 		drm_exec_retry_on_contention(exec);
> 		if (ret)
> 			drm_exec_stop(exec, ret);
>
> 		0;
> 	}));
>
> Which is pretty close to the syntax you defined initially, except for
> the '0;' oddity at the end, which is ugly, I admit.

Yeah and it looks like giving code blocks as macro argument is usually 
also not a good idea.

How about this:

#define drm_exec_until_all_locked(exec)                         \
         for (void *__drm_exec_retry_ptr; ({                     \
                 __label__ __drm_exec_retry;                     \
__drm_exec_retry:                                               \
                 __drm_exec_retry_ptr = &&__drm_exec_retry;      \
                 drm_exec_cleanup(exec);                         \
         });)

#define drm_exec_retry_on_contention(exec)              \
         if (unlikely(drm_exec_is_contended(exec)))      \
                 goto *__drm_exec_retry_ptr


The problem is that you can't declare a label so that it dominates the 
body of the loop.

But what you can do is to declare a void* which dominates the loop, then 
assign the address of a label to it and when you need it use goto*.

The goto* syntax is a gcc extension, but BPF is already using that in 
the upstream kernel.

It's quite a hack and I need to extend my testing a bit, but as far as I 
can see this actually works.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  4:07   ` Tatsuyuki Ishi
  2023-06-20  4:14     ` Tatsuyuki Ishi
@ 2023-06-20  8:12     ` Christian König
  2023-06-20  8:16       ` Tatsuyuki Ishi
  2023-06-20  8:28       ` Boris Brezillon
  1 sibling, 2 replies; 50+ messages in thread
From: Christian König @ 2023-06-20  8:12 UTC (permalink / raw)
  To: Tatsuyuki Ishi, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

Am 20.06.23 um 06:07 schrieb Tatsuyuki Ishi:
> +Boris and +Matthew in case you want to take over this patch set.
>
> Here are some review / testing comments, including those I posted 
> before to ease tracking.
>
> On 5/4/23 20:51, Christian König wrote:
>> Use the new component here as well and remove the old handling.
>>
>> v2: drop dupplicate handling
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  71 ++-----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   5 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 210 +++++++++-----------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   7 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
>>   7 files changed, 115 insertions(+), 204 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 02b827785e39..eba3e4f01ea6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device 
>> *adev, struct drm_file *filp,
>>         list->first_userptr = first_userptr;
>>       list->num_entries = num_entries;
>> +    sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
>> +         amdgpu_bo_list_entry_cmp, NULL);
>
> Previously amdgpu_bo_list_get_list sorted all entries, but this one 
> only sorts userptr entries. I think this changes behavior?

The intention here is to sort all entries except the userptrs. Need to 
double check.

>
>> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct 
>> amdgpu_cs_parser *p,
>>           e->user_invalidated = userpage_invalidated;
>>       }
>>   -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>> -                   &duplicates);
>> -    if (unlikely(r != 0)) {
>> -        if (r != -ERESTARTSYS)
>> -            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
>> -        goto out_free_user_pages;
>> +    drm_exec_while_not_all_locked(&p->exec) {
>> +        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
>> +        drm_exec_continue_on_contention(&p->exec);
>
> Duplicate handling is needed for pretty much every call of 
> amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv 
> for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.

Well no. AMDGPU_GEM_CREATE_VM_ALWAYS_VALID means that BOs should *not* 
be part of the relocation list. So when those cause an EALREADY here 
then userspace has a bug.

> I think Boris's suggestion of having this through a common 
> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.

No, again. The only driver which should accept duplicates is radeon, for 
all other drivers especially new ones duplicates should probably be 
rejected.

We only allow this for radeon because it is already UAPI, could be that 
we need to do this for amdgpu as well but I really hope we don't need this.

>
>> +        if (unlikely(r))
>> +            goto out_free_user_pages;
>> +
>> +        amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +            r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);
>
> Previously there were comments for how the fence count is calculated, 
> now they seem to be removed. I'd prefer if they were properly retained 
> as finding out who calls drm_resv_add_fence is not trivial, and wrong 
> reservation count can also be really hard to debug.

I need to double check this, the reservation count looks incorrect in 
the first place.

>
> Likewise for amdgpu_vm_lock_pd (which was added in another commit).
>
>> + drm_exec_break_on_contention(&p->exec);
>> +            if (unlikely(r))
>> +                goto out_free_user_pages;
>> +
>> +            e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
>> +            e->range = NULL;
>> +        }
>> +        drm_exec_continue_on_contention(&p->exec);
>> +
>> +        if (p->uf_bo) {
>> +            r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
>> +                         2);
>> +            drm_exec_continue_on_contention(&p->exec);
>> +            if (unlikely(r))
>> +                goto out_free_user_pages;
>> +        }
>>       }
>>   -    amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> -        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> +    amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>> +        struct mm_struct *usermm;
>>   -        e->bo_va = amdgpu_vm_bo_find(vm, bo);
>> +        usermm = amdgpu_ttm_tt_get_usermm(e->bo->tbo.ttm);
>> +        if (usermm && usermm != current->mm) {
>> +            r = -EPERM;
>> +            goto out_free_user_pages;
>> +        }
>> +
>> +        if (amdgpu_ttm_tt_is_userptr(e->bo->tbo.ttm) &&
>> +            e->user_invalidated && e->user_pages) {
>> +            amdgpu_bo_placement_from_domain(e->bo,
>> +                            AMDGPU_GEM_DOMAIN_CPU);
>> +            r = ttm_bo_validate(&e->bo->tbo, &e->bo->placement,
>> +                        &ctx);
>> +            if (r)
>> +                goto out_free_user_pages;
>> +
>> +            amdgpu_ttm_tt_set_user_pages(e->bo->tbo.ttm,
>> +                             e->user_pages);
>> +        }
>> +
>> +        kvfree(e->user_pages);
>> +        e->user_pages = NULL;
>>       }
>>         amdgpu_cs_get_threshold_for_moves(p->adev, 
>> &p->bytes_moved_threshold,
>> @@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct 
>> amdgpu_cs_parser *p,
>>        */
>>       r = 0;
>>       amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>> -        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> -
>> -        r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, e->range);
>> +        r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
>> +                            e->range);
>>           e->range = NULL;
>
> e->range = NULL; needs to be removed, or it's causing *massive* memory 
> leaks.


What? That sounds like nonsense to me 
amdgpu_ttm_tt_get_user_pages_done() frees the range it gets.

>
>>       }
>>       if (r) {
>> @@ -1307,20 +1281,22 @@ static int amdgpu_cs_submit(struct 
>> amdgpu_cs_parser *p,
>>       }
>>         p->fence = dma_fence_get(&leader->base.s_fence->finished);
>> -    list_for_each_entry(e, &p->validated, tv.head) {
>> +    drm_exec_for_each_locked_object(&p->exec, index, gobj) {
>> +
>> + ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
>>             /* Everybody except for the gang leader uses READ */
>>           for (i = 0; i < p->gang_size; ++i) {
>>               if (p->jobs[i] == leader)
>>                   continue;
>>   -            dma_resv_add_fence(e->tv.bo->base.resv,
>> +            dma_resv_add_fence(gobj->resv,
>> &p->jobs[i]->base.s_fence->finished,
>>                          DMA_RESV_USAGE_READ);
>>           }
>>   -        /* The gang leader is remembered as writer */
>> -        e->tv.num_shared = 0;
>> +        /* The gang leader as remembered as writer */
>> +        dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
>>       }
>
> Previously PD used READ accesses, now everything is WRITE. This 
> probably isn't right.

That shouldn't matter. We should switch to using BOOKKEEP at some point, 
but for now that's irrelevant.

Thanks,
Christian.

>
> Regards,
> Tatsuyuki


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  4:14     ` Tatsuyuki Ishi
@ 2023-06-20  8:13       ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-06-20  8:13 UTC (permalink / raw)
  To: Tatsuyuki Ishi, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

Am 20.06.23 um 06:14 schrieb Tatsuyuki Ishi:
> On 6/20/23 13:07, Tatsuyuki Ishi wrote:
>>> @@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser *p,
>>>        */
>>>       r = 0;
>>>       amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>>> -        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>>> -
>>> -        r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, 
>>> e->range);
>>> +        r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
>>> +                            e->range);
>>>           e->range = NULL;
>>
>> e->range = NULL; needs to be removed, or it's causing *massive* 
>> memory leaks.
>
> Actually, I quoted the wrong hunk, the correct one is below.

Ah, yes that makes more sense. Going to take a look.

Thanks,
Christian.

>
>> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct 
>> amdgpu_cs_parser *p,
>>          e->user_invalidated = userpage_invalidated;
>>      }
>>
>> -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>> -                   &duplicates);
>> -    if (unlikely(r != 0)) {
>> -        if (r != -ERESTARTSYS)
>> -            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
>> -        goto out_free_user_pages;
>> +    drm_exec_while_not_all_locked(&p->exec) {
>> +        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
>> +        drm_exec_continue_on_contention(&p->exec);
>> +        if (unlikely(r))
>> +            goto out_free_user_pages;
>> +
>> +        amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +            r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);
>> +            drm_exec_break_on_contention(&p->exec);
>> +            if (unlikely(r))
>> +                goto out_free_user_pages;
>> +
>> +            e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
>> +            e->range = NULL;
>
> This causes the leak.
>
>> +        }
>> +        drm_exec_continue_on_contention(&p->exec);
>> +
>> +        if (p->uf_bo) {
>> +            r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
>> +                         2);
>> +            drm_exec_continue_on_contention(&p->exec);
>> +            if (unlikely(r))
>> +                goto out_free_user_pages;
>> +        }
>>      }
>
> Tatsuyuki


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  8:12     ` Christian König
@ 2023-06-20  8:16       ` Tatsuyuki Ishi
  2023-06-20  9:04         ` Tatsuyuki Ishi
  2023-06-20  8:28       ` Boris Brezillon
  1 sibling, 1 reply; 50+ messages in thread
From: Tatsuyuki Ishi @ 2023-06-20  8:16 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

On 6/20/23 17:12, Christian König wrote:
> Am 20.06.23 um 06:07 schrieb Tatsuyuki Ishi:
>> +Boris and +Matthew in case you want to take over this patch set.
>>
>> Here are some review / testing comments, including those I posted before to ease tracking.
>>
>> On 5/4/23 20:51, Christian König wrote:
>>> Use the new component here as well and remove the old handling.
>>>
>>> v2: drop dupplicate handling
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  71 ++-----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   5 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 210 +++++++++-----------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   7 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
>>>   7 files changed, 115 insertions(+), 204 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 02b827785e39..eba3e4f01ea6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
>>>         list->first_userptr = first_userptr;
>>>       list->num_entries = num_entries;
>>> +    sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
>>> +         amdgpu_bo_list_entry_cmp, NULL);
>>
>> Previously amdgpu_bo_list_get_list sorted all entries, but this one only sorts userptr entries. I think this changes behavior?
> 
> The intention here is to sort all entries except the userptrs. Need to double check.

Sorry, I mistyped. You're right that it sorts all entries except the userptrs. The previous code seems to sort all entries including userptrs.

>>
>>> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>>           e->user_invalidated = userpage_invalidated;
>>>       }
>>>   -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>> -                   &duplicates);
>>> -    if (unlikely(r != 0)) {
>>> -        if (r != -ERESTARTSYS)
>>> -            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
>>> -        goto out_free_user_pages;
>>> +    drm_exec_while_not_all_locked(&p->exec) {
>>> +        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
>>> +        drm_exec_continue_on_contention(&p->exec);
>>
>> Duplicate handling is needed for pretty much every call of amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.
> 
> Well no. AMDGPU_GEM_CREATE_VM_ALWAYS_VALID means that BOs should *not* be part of the relocation list. So when those cause an EALREADY here then userspace has a bug.

Sounds fair, lemme check how RADV is handling this again.

Tatsuyuki


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  8:12     ` Christian König
  2023-06-20  8:16       ` Tatsuyuki Ishi
@ 2023-06-20  8:28       ` Boris Brezillon
  2023-06-20  8:44         ` Christian König
  1 sibling, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-20  8:28 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, Tatsuyuki Ishi, arunpravin.paneerselvam,
	thomas_os, francois.dugast, amd-gfx, luben.tuikov, dakr,
	dri-devel, felix.kuehling

On Tue, 20 Jun 2023 10:12:13 +0200
Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

> > I think Boris's suggestion of having this through a common 
> > DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.  
> 
> No, again. The only driver which should accept duplicates is radeon, for 
> all other drivers especially new ones duplicates should probably be 
> rejected.
> 
> We only allow this for radeon because it is already UAPI, could be that 
> we need to do this for amdgpu as well but I really hope we don't need this.

Just want to describe the use case we have: we support submission in
batch (several jobs passed to the submit ioctl) with a
submit-all-or-nothing model: if any of the job description is passed
wrong args or causes an allocation error, we fail the whole group. In
the submission path, we want to prepare GEMs for all jobs. That means
adding enough fence slots for the number job finished fences. Given not
all jobs will access the same set of BOs, I thought I could use
duplicates support to make my life easier, because otherwise I have to
collect all BOs upfront, store them in a temporary array, and keep
track of the number of fence slots needed for each of them. I guess
the other option would be to over-estimate the number of slots and make
it equal to num_jobs for all BOs.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  8:28       ` Boris Brezillon
@ 2023-06-20  8:44         ` Christian König
  2023-06-20  9:09           ` Boris Brezillon
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-06-20  8:44 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: matthew.brost, Tatsuyuki Ishi, arunpravin.paneerselvam,
	thomas_os, francois.dugast, amd-gfx, luben.tuikov, dakr,
	dri-devel, felix.kuehling

Am 20.06.23 um 10:28 schrieb Boris Brezillon:
> On Tue, 20 Jun 2023 10:12:13 +0200
> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
>
>>> I think Boris's suggestion of having this through a common
>>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.
>> No, again. The only driver which should accept duplicates is radeon, for
>> all other drivers especially new ones duplicates should probably be
>> rejected.
>>
>> We only allow this for radeon because it is already UAPI, could be that
>> we need to do this for amdgpu as well but I really hope we don't need this.
> Just want to describe the use case we have: we support submission in
> batch (several jobs passed to the submit ioctl) with a
> submit-all-or-nothing model: if any of the job description is passed
> wrong args or causes an allocation error, we fail the whole group. In
> the submission path, we want to prepare GEMs for all jobs. That means
> adding enough fence slots for the number job finished fences. Given not
> all jobs will access the same set of BOs, I thought I could use
> duplicates support to make my life easier, because otherwise I have to
> collect all BOs upfront, store them in a temporary array, and keep
> track of the number of fence slots needed for each of them. I guess
> the other option would be to over-estimate the number of slots and make
> it equal to num_jobs for all BOs.

Sounds pretty much what amdgpu is doing as well, but question is why 
don't you give just one list of BOs? Do you really want to add the 
fences that fine grained?

For radeon it turned out that we just had stupid userspace which 
sometimes mentioned a BO in the list twice.

On the other hand over estimating the number of fences needed is 
perfectly fine as well, that is rounded up to the next kvmalloc size or 
even next page size anyway.

So IIRC and you have <510 fences you either get 14, 30, 62, 126, 254 and 
above 510 you should get it rounded to the next 512.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  8:16       ` Tatsuyuki Ishi
@ 2023-06-20  9:04         ` Tatsuyuki Ishi
  0 siblings, 0 replies; 50+ messages in thread
From: Tatsuyuki Ishi @ 2023-06-20  9:04 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel, matthew.brost, boris.brezillon

On 6/20/23 17:16, Tatsuyuki Ishi wrote:
> On 6/20/23 17:12, Christian König wrote:
>> Am 20.06.23 um 06:07 schrieb Tatsuyuki Ishi:
>>>> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>>>           e->user_invalidated = userpage_invalidated;
>>>>       }
>>>>   -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>>> -                   &duplicates);
>>>> -    if (unlikely(r != 0)) {
>>>> -        if (r != -ERESTARTSYS)
>>>> -            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
>>>> -        goto out_free_user_pages;
>>>> +    drm_exec_while_not_all_locked(&p->exec) {
>>>> +        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
>>>> +        drm_exec_continue_on_contention(&p->exec);
>>>
>>> Duplicate handling is needed for pretty much every call of amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.
>>
>> Well no. AMDGPU_GEM_CREATE_VM_ALWAYS_VALID means that BOs should *not* be part of the relocation list. So when those cause an EALREADY here then userspace has a bug.
> 
> Sounds fair, lemme check how RADV is handling this again.

I checked again and relocation list was actually fine, but other places were not. For example amdgpu_gem_object_close
locks both bo->tbo.base.resv and vm->root.bo->tbo.base.resv (PD) on its own.

This was the easily debuggable case since it caused an error log but some other BO operations on ALWAYS_VALID
is also presumably broken due to the same reason.

Tatsuyuki

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  8:44         ` Christian König
@ 2023-06-20  9:09           ` Boris Brezillon
  2023-06-20  9:14             ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Boris Brezillon @ 2023-06-20  9:09 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, Tatsuyuki Ishi, arunpravin.paneerselvam,
	thomas_os, francois.dugast, amd-gfx, luben.tuikov, dakr,
	dri-devel, felix.kuehling

On Tue, 20 Jun 2023 10:44:26 +0200
Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

> Am 20.06.23 um 10:28 schrieb Boris Brezillon:
> > On Tue, 20 Jun 2023 10:12:13 +0200
> > Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> >  
> >>> I think Boris's suggestion of having this through a common
> >>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.  
> >> No, again. The only driver which should accept duplicates is radeon, for
> >> all other drivers especially new ones duplicates should probably be
> >> rejected.
> >>
> >> We only allow this for radeon because it is already UAPI, could be that
> >> we need to do this for amdgpu as well but I really hope we don't need this.  
> > Just want to describe the use case we have: we support submission in
> > batch (several jobs passed to the submit ioctl) with a
> > submit-all-or-nothing model: if any of the job description is passed
> > wrong args or causes an allocation error, we fail the whole group. In
> > the submission path, we want to prepare GEMs for all jobs. That means
> > adding enough fence slots for the number job finished fences. Given not
> > all jobs will access the same set of BOs, I thought I could use
> > duplicates support to make my life easier, because otherwise I have to
> > collect all BOs upfront, store them in a temporary array, and keep
> > track of the number of fence slots needed for each of them. I guess
> > the other option would be to over-estimate the number of slots and make
> > it equal to num_jobs for all BOs.  
> 
> Sounds pretty much what amdgpu is doing as well, but question is why 
> don't you give just one list of BOs? Do you really want to add the 
> fences that fine grained?

Actually, we don't give a list of BOs at all, we pass a VM, and lock
all BOs attached to the VM (similar to what Xe does). And, as all other
drivers being submitted recently, we use explicit sync, so most of
those VM BOs, except for the imported/exported ones, will be given a
BOOKKEEP fence.

The reason we need support for duplicates is because we also have
implicit BOs (like the HWRT object that's shared by the
geometry/fragment queues to pass data around), and those can be passed
to multiple jobs in a given batch and require special synchronization
(geometry job writes to them, fragment job reads from them, so we have
a reader/writer sync to express). I can of course de-duplicate upfront,
by parsing jobs and creating an array of BOs that need to be acquired
over the whole submission, but that's still one extra-step I'd prefer
to avoid, given the dma_resv framework allows us to figure it out at
lock time. I can also just deal with the EALREADY case in the driver
directly, it's not like it's super complicated anyway, just thought
other drivers would fall in the same situation, that's all.

> 
> For radeon it turned out that we just had stupid userspace which 
> sometimes mentioned a BO in the list twice.

Okay, that's not the same thing, indeed.

> 
> On the other hand over estimating the number of fences needed is 
> perfectly fine as well, that is rounded up to the next kvmalloc size or 
> even next page size anyway.

Yeah, actually over-provisioning is not the most annoying part.
Iterating over jobs to collect 'meta'-BOs is, so if I can just rely on
EALREADY to detect that case and fallback to reserving an extra slot in
that situation, I'd prefer that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/13] drm/qxl: switch to using drm_exec
  2023-05-04 11:51 ` [PATCH 08/13] drm/qxl: switch to using drm_exec Christian König
@ 2023-06-20  9:13   ` Thomas Zimmermann
  2023-06-20  9:15     ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Zimmermann @ 2023-06-20  9:13 UTC (permalink / raw)
  To: Christian König, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6875 bytes --]

Hi

Am 04.05.23 um 13:51 schrieb Christian König:
> Just a straightforward conversion without any optimization.
> 
> Only compile tested for now.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/qxl/Kconfig       |  1 +
>   drivers/gpu/drm/qxl/qxl_drv.h     |  7 ++--
>   drivers/gpu/drm/qxl/qxl_release.c | 67 ++++++++++++++++---------------
>   3 files changed, 39 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/qxl/Kconfig b/drivers/gpu/drm/qxl/Kconfig
> index ca3f51c2a8fe..9c8e433be33e 100644
> --- a/drivers/gpu/drm/qxl/Kconfig
> +++ b/drivers/gpu/drm/qxl/Kconfig
> @@ -5,6 +5,7 @@ config DRM_QXL
>   	select DRM_KMS_HELPER
>   	select DRM_TTM
>   	select DRM_TTM_HELPER
> +	select DRM_EXEC

Just some nitpicking, but can we try to keep these select statements 
sorted alphabetically?

>   	select CRC32
>   	help
>   	  QXL virtual GPU for Spice virtualization desktop integration.
> diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
> index ea993d7162e8..3e732648b332 100644
> --- a/drivers/gpu/drm/qxl/qxl_drv.h
> +++ b/drivers/gpu/drm/qxl/qxl_drv.h
> @@ -38,12 +38,12 @@
>   
>   #include <drm/drm_crtc.h>
>   #include <drm/drm_encoder.h>
> +#include <drm/drm_exec.h>
>   #include <drm/drm_gem_ttm_helper.h>
>   #include <drm/drm_ioctl.h>
>   #include <drm/drm_gem.h>
>   #include <drm/qxl_drm.h>
>   #include <drm/ttm/ttm_bo.h>
> -#include <drm/ttm/ttm_execbuf_util.h>
>   #include <drm/ttm/ttm_placement.h>
>   
>   #include "qxl_dev.h"
> @@ -101,7 +101,8 @@ struct qxl_gem {
>   };
>   
>   struct qxl_bo_list {
> -	struct ttm_validate_buffer tv;
> +	struct qxl_bo		*bo;
> +	struct list_head	list;
>   };
>   
>   struct qxl_crtc {
> @@ -151,7 +152,7 @@ struct qxl_release {
>   	struct qxl_bo *release_bo;
>   	uint32_t release_offset;
>   	uint32_t surface_release_id;
> -	struct ww_acquire_ctx ticket;
> +	struct drm_exec	exec;
>   	struct list_head bos;
>   };
>   
> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
> index 368d26da0d6a..da7cd9cd58f9 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -121,13 +121,11 @@ qxl_release_free_list(struct qxl_release *release)
>   {
>   	while (!list_empty(&release->bos)) {
>   		struct qxl_bo_list *entry;
> -		struct qxl_bo *bo;
>   
>   		entry = container_of(release->bos.next,
> -				     struct qxl_bo_list, tv.head);
> -		bo = to_qxl_bo(entry->tv.bo);
> -		qxl_bo_unref(&bo);
> -		list_del(&entry->tv.head);
> +				     struct qxl_bo_list, list);
> +		qxl_bo_unref(&entry->bo);
> +		list_del(&entry->list);
>   		kfree(entry);
>   	}
>   	release->release_bo = NULL;
> @@ -172,8 +170,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
>   {
>   	struct qxl_bo_list *entry;
>   
> -	list_for_each_entry(entry, &release->bos, tv.head) {
> -		if (entry->tv.bo == &bo->tbo)
> +	list_for_each_entry(entry, &release->bos, list) {
> +		if (entry->bo == bo)
>   			return 0;
>   	}
>   
> @@ -182,9 +180,8 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
>   		return -ENOMEM;
>   
>   	qxl_bo_ref(bo);
> -	entry->tv.bo = &bo->tbo;
> -	entry->tv.num_shared = 0;
> -	list_add_tail(&entry->tv.head, &release->bos);
> +	entry->bo = bo;
> +	list_add_tail(&entry->list, &release->bos);
>   	return 0;
>   }
>   
> @@ -221,21 +218,27 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
>   	if (list_is_singular(&release->bos))
>   		return 0;
>   
> -	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
> -				     !no_intr, NULL);
> -	if (ret)
> -		return ret;
> -
> -	list_for_each_entry(entry, &release->bos, tv.head) {
> -		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
> -
> -		ret = qxl_release_validate_bo(bo);
> -		if (ret) {
> -			ttm_eu_backoff_reservation(&release->ticket, &release->bos);
> -			return ret;
> +	drm_exec_init(&release->exec, !no_intr);
> +	drm_exec_while_not_all_locked(&release->exec) {
> +		list_for_each_entry(entry, &release->bos, list) {
> +			ret = drm_exec_prepare_obj(&release->exec,
> +						   &entry->bo->tbo.base,
> +						   1);
> +			drm_exec_break_on_contention(&release->exec);
> +			if (ret)
> +				goto error;
>   		}
>   	}
> +
> +	list_for_each_entry(entry, &release->bos, list) {
> +		ret = qxl_release_validate_bo(entry->bo);
> +		if (ret)
> +			goto error;
> +	}
>   	return 0;
> +error:
> +	drm_exec_fini(&release->exec);
> +	return ret;
>   }
>   
>   void qxl_release_backoff_reserve_list(struct qxl_release *release)
> @@ -245,7 +248,7 @@ void qxl_release_backoff_reserve_list(struct qxl_release *release)
>   	if (list_is_singular(&release->bos))
>   		return;
>   
> -	ttm_eu_backoff_reservation(&release->ticket, &release->bos);
> +	drm_exec_fini(&release->exec);
>   }
>   
>   int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,
> @@ -404,18 +407,18 @@ void qxl_release_unmap(struct qxl_device *qdev,
>   
>   void qxl_release_fence_buffer_objects(struct qxl_release *release)
>   {
> -	struct ttm_buffer_object *bo;
>   	struct ttm_device *bdev;
> -	struct ttm_validate_buffer *entry;
> +	struct qxl_bo_list *entry;
>   	struct qxl_device *qdev;
> +	struct qxl_bo *bo;
>   
>   	/* if only one object on the release its the release itself
>   	   since these objects are pinned no need to reserve */
>   	if (list_is_singular(&release->bos) || list_empty(&release->bos))
>   		return;
>   
> -	bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo;
> -	bdev = bo->bdev;
> +	bo = list_first_entry(&release->bos, struct qxl_bo_list, list)->bo;
> +	bdev = bo->tbo.bdev;
>   	qdev = container_of(bdev, struct qxl_device, mman.bdev);
>   
>   	/*
> @@ -426,14 +429,12 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
>   		       release->id | 0xf0000000, release->base.seqno);
>   	trace_dma_fence_emit(&release->base);
>   
> -	list_for_each_entry(entry, &release->bos, head) {
> +	list_for_each_entry(entry, &release->bos, list) {
>   		bo = entry->bo;
>   
> -		dma_resv_add_fence(bo->base.resv, &release->base,
> +		dma_resv_add_fence(bo->tbo.base.resv, &release->base,
>   				   DMA_RESV_USAGE_READ);
> -		ttm_bo_move_to_lru_tail_unlocked(bo);
> -		dma_resv_unlock(bo->base.resv);
> +		ttm_bo_move_to_lru_tail_unlocked(&bo->tbo);
>   	}
> -	ww_acquire_fini(&release->ticket);
> +	drm_exec_fini(&release->exec);
>   }
> -

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  9:09           ` Boris Brezillon
@ 2023-06-20  9:14             ` Christian König
  2023-06-20  9:20               ` Boris Brezillon
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-06-20  9:14 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: matthew.brost, Tatsuyuki Ishi, arunpravin.paneerselvam,
	thomas_os, francois.dugast, amd-gfx, luben.tuikov, dakr,
	dri-devel, felix.kuehling

Am 20.06.23 um 11:09 schrieb Boris Brezillon:
> On Tue, 20 Jun 2023 10:44:26 +0200
> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Am 20.06.23 um 10:28 schrieb Boris Brezillon:
>>> On Tue, 20 Jun 2023 10:12:13 +0200
>>> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
>>>   
>>>>> I think Boris's suggestion of having this through a common
>>>>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.
>>>> No, again. The only driver which should accept duplicates is radeon, for
>>>> all other drivers especially new ones duplicates should probably be
>>>> rejected.
>>>>
>>>> We only allow this for radeon because it is already UAPI, could be that
>>>> we need to do this for amdgpu as well but I really hope we don't need this.
>>> Just want to describe the use case we have: we support submission in
>>> batch (several jobs passed to the submit ioctl) with a
>>> submit-all-or-nothing model: if any of the job description is passed
>>> wrong args or causes an allocation error, we fail the whole group. In
>>> the submission path, we want to prepare GEMs for all jobs. That means
>>> adding enough fence slots for the number job finished fences. Given not
>>> all jobs will access the same set of BOs, I thought I could use
>>> duplicates support to make my life easier, because otherwise I have to
>>> collect all BOs upfront, store them in a temporary array, and keep
>>> track of the number of fence slots needed for each of them. I guess
>>> the other option would be to over-estimate the number of slots and make
>>> it equal to num_jobs for all BOs.
>> Sounds pretty much what amdgpu is doing as well, but question is why
>> don't you give just one list of BOs? Do you really want to add the
>> fences that fine grained?
> Actually, we don't give a list of BOs at all, we pass a VM, and lock
> all BOs attached to the VM (similar to what Xe does). And, as all other
> drivers being submitted recently, we use explicit sync, so most of
> those VM BOs, except for the imported/exported ones, will be given a
> BOOKKEEP fence.
>
> The reason we need support for duplicates is because we also have
> implicit BOs (like the HWRT object that's shared by the
> geometry/fragment queues to pass data around), and those can be passed
> to multiple jobs in a given batch and require special synchronization
> (geometry job writes to them, fragment job reads from them, so we have
> a reader/writer sync to express). I can of course de-duplicate upfront,
> by parsing jobs and creating an array of BOs that need to be acquired
> over the whole submission, but that's still one extra-step I'd prefer
> to avoid, given the dma_resv framework allows us to figure it out at
> lock time. I can also just deal with the EALREADY case in the driver
> directly, it's not like it's super complicated anyway, just thought
> other drivers would fall in the same situation, that's all.

Well as long as you just need to ignore EALREADY, that should be trivial 
and doable.

What radeon needs is to keep EALREADY BOs in a separate container 
because we need to double check their properties to not break the UAPI.

I strongly think that this shouldn't be needed by any other driver.

Going to add a flag to ignore EALREADY which can be set during exec init.

Regards,
Christian.

>
>> For radeon it turned out that we just had stupid userspace which
>> sometimes mentioned a BO in the list twice.
> Okay, that's not the same thing, indeed.
>
>> On the other hand over estimating the number of fences needed is
>> perfectly fine as well, that is rounded up to the next kvmalloc size or
>> even next page size anyway.
> Yeah, actually over-provisioning is not the most annoying part.
> Iterating over jobs to collect 'meta'-BOs is, so if I can just rely on
> EALREADY to detect that case and fallback to reserving an extra slot in
> that situation, I'd prefer that.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/13] drm/qxl: switch to using drm_exec
  2023-06-20  9:13   ` Thomas Zimmermann
@ 2023-06-20  9:15     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-06-20  9:15 UTC (permalink / raw)
  To: Thomas Zimmermann, francois.dugast, felix.kuehling,
	arunpravin.paneerselvam, thomas_os, dakr, luben.tuikov, amd-gfx,
	dri-devel

Am 20.06.23 um 11:13 schrieb Thomas Zimmermann:
> Hi
>
> Am 04.05.23 um 13:51 schrieb Christian König:
>> Just a straightforward conversion without any optimization.
>>
>> Only compile tested for now.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/qxl/Kconfig       |  1 +
>>   drivers/gpu/drm/qxl/qxl_drv.h     |  7 ++--
>>   drivers/gpu/drm/qxl/qxl_release.c | 67 ++++++++++++++++---------------
>>   3 files changed, 39 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/qxl/Kconfig b/drivers/gpu/drm/qxl/Kconfig
>> index ca3f51c2a8fe..9c8e433be33e 100644
>> --- a/drivers/gpu/drm/qxl/Kconfig
>> +++ b/drivers/gpu/drm/qxl/Kconfig
>> @@ -5,6 +5,7 @@ config DRM_QXL
>>       select DRM_KMS_HELPER
>>       select DRM_TTM
>>       select DRM_TTM_HELPER
>> +    select DRM_EXEC
>
> Just some nitpicking, but can we try to keep these select statements 
> sorted alphabetically?

Sure and good point, going to apply that to other drivers as well.

Christian.

>
>>       select CRC32
>>       help
>>         QXL virtual GPU for Spice virtualization desktop integration.
>> diff --git a/drivers/gpu/drm/qxl/qxl_drv.h 
>> b/drivers/gpu/drm/qxl/qxl_drv.h
>> index ea993d7162e8..3e732648b332 100644
>> --- a/drivers/gpu/drm/qxl/qxl_drv.h
>> +++ b/drivers/gpu/drm/qxl/qxl_drv.h
>> @@ -38,12 +38,12 @@
>>     #include <drm/drm_crtc.h>
>>   #include <drm/drm_encoder.h>
>> +#include <drm/drm_exec.h>
>>   #include <drm/drm_gem_ttm_helper.h>
>>   #include <drm/drm_ioctl.h>
>>   #include <drm/drm_gem.h>
>>   #include <drm/qxl_drm.h>
>>   #include <drm/ttm/ttm_bo.h>
>> -#include <drm/ttm/ttm_execbuf_util.h>
>>   #include <drm/ttm/ttm_placement.h>
>>     #include "qxl_dev.h"
>> @@ -101,7 +101,8 @@ struct qxl_gem {
>>   };
>>     struct qxl_bo_list {
>> -    struct ttm_validate_buffer tv;
>> +    struct qxl_bo        *bo;
>> +    struct list_head    list;
>>   };
>>     struct qxl_crtc {
>> @@ -151,7 +152,7 @@ struct qxl_release {
>>       struct qxl_bo *release_bo;
>>       uint32_t release_offset;
>>       uint32_t surface_release_id;
>> -    struct ww_acquire_ctx ticket;
>> +    struct drm_exec    exec;
>>       struct list_head bos;
>>   };
>>   diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
>> b/drivers/gpu/drm/qxl/qxl_release.c
>> index 368d26da0d6a..da7cd9cd58f9 100644
>> --- a/drivers/gpu/drm/qxl/qxl_release.c
>> +++ b/drivers/gpu/drm/qxl/qxl_release.c
>> @@ -121,13 +121,11 @@ qxl_release_free_list(struct qxl_release *release)
>>   {
>>       while (!list_empty(&release->bos)) {
>>           struct qxl_bo_list *entry;
>> -        struct qxl_bo *bo;
>>             entry = container_of(release->bos.next,
>> -                     struct qxl_bo_list, tv.head);
>> -        bo = to_qxl_bo(entry->tv.bo);
>> -        qxl_bo_unref(&bo);
>> -        list_del(&entry->tv.head);
>> +                     struct qxl_bo_list, list);
>> +        qxl_bo_unref(&entry->bo);
>> +        list_del(&entry->list);
>>           kfree(entry);
>>       }
>>       release->release_bo = NULL;
>> @@ -172,8 +170,8 @@ int qxl_release_list_add(struct qxl_release 
>> *release, struct qxl_bo *bo)
>>   {
>>       struct qxl_bo_list *entry;
>>   -    list_for_each_entry(entry, &release->bos, tv.head) {
>> -        if (entry->tv.bo == &bo->tbo)
>> +    list_for_each_entry(entry, &release->bos, list) {
>> +        if (entry->bo == bo)
>>               return 0;
>>       }
>>   @@ -182,9 +180,8 @@ int qxl_release_list_add(struct qxl_release 
>> *release, struct qxl_bo *bo)
>>           return -ENOMEM;
>>         qxl_bo_ref(bo);
>> -    entry->tv.bo = &bo->tbo;
>> -    entry->tv.num_shared = 0;
>> -    list_add_tail(&entry->tv.head, &release->bos);
>> +    entry->bo = bo;
>> +    list_add_tail(&entry->list, &release->bos);
>>       return 0;
>>   }
>>   @@ -221,21 +218,27 @@ int qxl_release_reserve_list(struct 
>> qxl_release *release, bool no_intr)
>>       if (list_is_singular(&release->bos))
>>           return 0;
>>   -    ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos,
>> -                     !no_intr, NULL);
>> -    if (ret)
>> -        return ret;
>> -
>> -    list_for_each_entry(entry, &release->bos, tv.head) {
>> -        struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
>> -
>> -        ret = qxl_release_validate_bo(bo);
>> -        if (ret) {
>> -            ttm_eu_backoff_reservation(&release->ticket, 
>> &release->bos);
>> -            return ret;
>> +    drm_exec_init(&release->exec, !no_intr);
>> +    drm_exec_while_not_all_locked(&release->exec) {
>> +        list_for_each_entry(entry, &release->bos, list) {
>> +            ret = drm_exec_prepare_obj(&release->exec,
>> +                           &entry->bo->tbo.base,
>> +                           1);
>> + drm_exec_break_on_contention(&release->exec);
>> +            if (ret)
>> +                goto error;
>>           }
>>       }
>> +
>> +    list_for_each_entry(entry, &release->bos, list) {
>> +        ret = qxl_release_validate_bo(entry->bo);
>> +        if (ret)
>> +            goto error;
>> +    }
>>       return 0;
>> +error:
>> +    drm_exec_fini(&release->exec);
>> +    return ret;
>>   }
>>     void qxl_release_backoff_reserve_list(struct qxl_release *release)
>> @@ -245,7 +248,7 @@ void qxl_release_backoff_reserve_list(struct 
>> qxl_release *release)
>>       if (list_is_singular(&release->bos))
>>           return;
>>   -    ttm_eu_backoff_reservation(&release->ticket, &release->bos);
>> +    drm_exec_fini(&release->exec);
>>   }
>>     int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,
>> @@ -404,18 +407,18 @@ void qxl_release_unmap(struct qxl_device *qdev,
>>     void qxl_release_fence_buffer_objects(struct qxl_release *release)
>>   {
>> -    struct ttm_buffer_object *bo;
>>       struct ttm_device *bdev;
>> -    struct ttm_validate_buffer *entry;
>> +    struct qxl_bo_list *entry;
>>       struct qxl_device *qdev;
>> +    struct qxl_bo *bo;
>>         /* if only one object on the release its the release itself
>>          since these objects are pinned no need to reserve */
>>       if (list_is_singular(&release->bos) || list_empty(&release->bos))
>>           return;
>>   -    bo = list_first_entry(&release->bos, struct 
>> ttm_validate_buffer, head)->bo;
>> -    bdev = bo->bdev;
>> +    bo = list_first_entry(&release->bos, struct qxl_bo_list, list)->bo;
>> +    bdev = bo->tbo.bdev;
>>       qdev = container_of(bdev, struct qxl_device, mman.bdev);
>>         /*
>> @@ -426,14 +429,12 @@ void qxl_release_fence_buffer_objects(struct 
>> qxl_release *release)
>>                  release->id | 0xf0000000, release->base.seqno);
>>       trace_dma_fence_emit(&release->base);
>>   -    list_for_each_entry(entry, &release->bos, head) {
>> +    list_for_each_entry(entry, &release->bos, list) {
>>           bo = entry->bo;
>>   -        dma_resv_add_fence(bo->base.resv, &release->base,
>> +        dma_resv_add_fence(bo->tbo.base.resv, &release->base,
>>                      DMA_RESV_USAGE_READ);
>> -        ttm_bo_move_to_lru_tail_unlocked(bo);
>> -        dma_resv_unlock(bo->base.resv);
>> +        ttm_bo_move_to_lru_tail_unlocked(&bo->tbo);
>>       }
>> -    ww_acquire_fini(&release->ticket);
>> +    drm_exec_fini(&release->exec);
>>   }
>> -
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2
  2023-06-20  9:14             ` Christian König
@ 2023-06-20  9:20               ` Boris Brezillon
  0 siblings, 0 replies; 50+ messages in thread
From: Boris Brezillon @ 2023-06-20  9:20 UTC (permalink / raw)
  To: Christian König
  Cc: matthew.brost, Tatsuyuki Ishi, arunpravin.paneerselvam,
	thomas_os, francois.dugast, amd-gfx, luben.tuikov, dakr,
	dri-devel, felix.kuehling

On Tue, 20 Jun 2023 11:14:51 +0200
Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

> Am 20.06.23 um 11:09 schrieb Boris Brezillon:
> > On Tue, 20 Jun 2023 10:44:26 +0200
> > Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> >  
> >> Am 20.06.23 um 10:28 schrieb Boris Brezillon:  
> >>> On Tue, 20 Jun 2023 10:12:13 +0200
> >>> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>     
> >>>>> I think Boris's suggestion of having this through a common
> >>>>> DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.  
> >>>> No, again. The only driver which should accept duplicates is radeon, for
> >>>> all other drivers especially new ones duplicates should probably be
> >>>> rejected.
> >>>>
> >>>> We only allow this for radeon because it is already UAPI, could be that
> >>>> we need to do this for amdgpu as well but I really hope we don't need this.  
> >>> Just want to describe the use case we have: we support submission in
> >>> batch (several jobs passed to the submit ioctl) with a
> >>> submit-all-or-nothing model: if any of the job description is passed
> >>> wrong args or causes an allocation error, we fail the whole group. In
> >>> the submission path, we want to prepare GEMs for all jobs. That means
> >>> adding enough fence slots for the number job finished fences. Given not
> >>> all jobs will access the same set of BOs, I thought I could use
> >>> duplicates support to make my life easier, because otherwise I have to
> >>> collect all BOs upfront, store them in a temporary array, and keep
> >>> track of the number of fence slots needed for each of them. I guess
> >>> the other option would be to over-estimate the number of slots and make
> >>> it equal to num_jobs for all BOs.  
> >> Sounds pretty much what amdgpu is doing as well, but question is why
> >> don't you give just one list of BOs? Do you really want to add the
> >> fences that fine grained?  
> > Actually, we don't give a list of BOs at all, we pass a VM, and lock
> > all BOs attached to the VM (similar to what Xe does). And, as all other
> > drivers being submitted recently, we use explicit sync, so most of
> > those VM BOs, except for the imported/exported ones, will be given a
> > BOOKKEEP fence.
> >
> > The reason we need support for duplicates is because we also have
> > implicit BOs (like the HWRT object that's shared by the
> > geometry/fragment queues to pass data around), and those can be passed
> > to multiple jobs in a given batch and require special synchronization
> > (geometry job writes to them, fragment job reads from them, so we have
> > a reader/writer sync to express). I can of course de-duplicate upfront,
> > by parsing jobs and creating an array of BOs that need to be acquired
> > over the whole submission, but that's still one extra-step I'd prefer
> > to avoid, given the dma_resv framework allows us to figure it out at
> > lock time. I can also just deal with the EALREADY case in the driver
> > directly, it's not like it's super complicated anyway, just thought
> > other drivers would fall in the same situation, that's all.  
> 
> Well as long as you just need to ignore EALREADY, that should be trivial 
> and doable.

Oh, yeah, that's all I need really. We probably don't want to add the
GEM object a second time in the array though, hence the goto
reserve_fences in my proposal when EALREADY is returned.

> 
> What radeon needs is to keep EALREADY BOs in a separate container 
> because we need to double check their properties to not break the UAPI.
> 
> I strongly think that this shouldn't be needed by any other driver.
> 
> Going to add a flag to ignore EALREADY which can be set during exec init.

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] drm: execution context for GEM buffers v4
  2023-06-19 11:06                   ` Thomas Hellström (Intel)
@ 2023-06-21 13:35                     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-06-21 13:35 UTC (permalink / raw)
  To: Thomas Hellström (Intel), Boris Brezillon
  Cc: Matthew Brost, arunpravin.paneerselvam, felix.kuehling,
	francois.dugast, amd-gfx, luben.tuikov, dakr, dri-devel

Am 19.06.23 um 13:06 schrieb Thomas Hellström (Intel):
>
> On 6/19/23 11:48, Christian König wrote:
>> Hi,
>>
>> Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):
>>> [SNIP]
>>>>> Sometimes you want to just drop the contended lock after the above 
>>>>> relaxation. (Eviction would be one example), and not add as 
>>>>> prelocked, if the contended object goes out of scope. Eviction 
>>>>> would in some situations be one such example, -EDEADLOCK leading 
>>>>> to an error path where the object should otherwise be freed is 
>>>>> another. Perhaps we could add an argument to prepare_obj() as to 
>>>>> whether the object should be immediately put after relaxation.
>>>>
>>>> I was considering a try_prepare version as well, that should cover 
>>>> this use case.
>>>
>>> That sounds a bit different from this use-case. The use-case above 
>>> would, on -EDEADLOCK actually unlock everything, then lock-slow the 
>>> contending lock and then immediately unlock it and drop.
>>
>> Hui? What would that be good for?
>
> It's for the case where you have nested locking, the contended lock 
> has gone out-of-scope and you're probably not going to need it on the 
> next attempt. I think the refcounted "prelocked" object that is 
> lacking in the i915 variant will resolve all correctness / uaf issues, 
> but still the object might be needlessly carried around for yet 
> another locking round.

Yeah, but that case is so rare that we probably don't need to care about it.

I've changed the implementation so that it should now match the other 
requirements.

Going to send that out now, please double check.

Thanks,
Christian.


>
>
>
>>
>>> It sounds like try_prepare would just skip locking and continue with 
>>> everything locked so far still locked?
>>
>> Correct.
>>
>>>
>>>>
>>>>>
>>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>>> +    if (unlikely(ret)) {
>>>>>> +        dma_resv_unlock(obj->resv);
>>>>>> +        goto error_dropref;
>>>>>> +    }
>>>>>> +
>>>>>> +    swap(exec->prelocked, obj);
>>>>>> +
>>>>>> +error_dropref:
>>>>>> +    /* Always cleanup the contention so that error handling can 
>>>>>> kick in */
>>>>>> +    drm_gem_object_put(obj);
>>>>>> +    exec->contended = NULL;
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_prepare_obj - prepare a GEM object for use
>>>>>> + * @exec: the drm_exec object with the state
>>>>>> + * @obj: the GEM object to prepare
>>>>>> + * @num_fences: how many fences to reserve
>>>>>> + *
>>>>>> + * Prepare a GEM object for use by locking it and reserving 
>>>>>> fence slots. All
>>>>>> + * successfully locked objects are put into the locked container.
>>>>>> + *
>>>>>> + * Returns: -EDEADLK if a contention is detected, -EALREADY when 
>>>>>> object is
>>>>>> + * already locked, -ENOMEM when memory allocation failed and 
>>>>>> zero for success.
>>>>>> + */
>>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>>> drm_gem_object *obj,
>>>>>> +             unsigned int num_fences)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    ret = drm_exec_lock_contended(exec);
>>>>>> +    if (unlikely(ret))
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    if (exec->prelocked == obj) {
>>>>>> +        drm_gem_object_put(exec->prelocked);
>>>>>> +        exec->prelocked = NULL;
>>>>>> +
>>>>>> +        return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
>>>>>> +        ret = dma_resv_lock_interruptible(obj->resv, 
>>>>>> &exec->ticket);
>>>>>> +    else
>>>>>> +        ret = dma_resv_lock(obj->resv, &exec->ticket);
>>>>>> +
>>>>>> +    if (unlikely(ret == -EDEADLK)) {
>>>>>> +        drm_gem_object_get(obj);
>>>>>> +        exec->contended = obj;
>>>>>> +        return -EDEADLK;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (unlikely(ret == -EALREADY &&
>>>>>> +        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
>>>>>> +        goto reserve_fences;
>>>>>> +
>>>>>> +    if (unlikely(ret))
>>>>>> +        return ret;
>>>>>> +
>>>>>> +    ret = drm_exec_obj_locked(exec, obj);
>>>>>> +    if (ret)
>>>>>> +        goto error_unlock;
>>>>>> +
>>>>>> +reserve_fences:
>>>>>> +    /* Keep locked when reserving fences fails */
>>>>>> +    return dma_resv_reserve_fences(obj->resv, num_fences);
>>>>>
>>>>> Ugh, what is the use-case for keeping things locked here? How 
>>>>> would a caller tell the difference between an error where 
>>>>> everything is locked and nothing is locked? IMO, we should unlock 
>>>>> on error here. If there indeed is a use-case we should add a 
>>>>> separate function for reserving fences for all locked objects, 
>>>>> rather than returning sometimes locked on error sometime not.
>>>>
>>>> We return the object locked here because it was to much churn to 
>>>> remove it again from the array and we are getting fully cleaned up 
>>>> at the end anyway.
>>>
>>> OK, so if we add an unlock functionality, we could just have a 
>>> consistent locking state on error return?
>>
>> Yeah, that should work. Going to work on this.
>
> Great.
>
> Thanks,
>
> Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Thomas
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>> +
>>>>>> +error_unlock:
>>>>>> +    dma_resv_unlock(obj->resv);
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_exec_prepare_obj);
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_prepare_array - helper to prepare an array of objects
>>>>>> + * @exec: the drm_exec object with the state
>>>>>> + * @objects: array of GEM object to prepare
>>>>>> + * @num_objects: number of GEM objects in the array
>>>>>> + * @num_fences: number of fences to reserve on each GEM object
>>>>>> + *
>>>>>> + * Prepares all GEM objects in an array, handles contention but 
>>>>>> aports on first
>>>>>> + * error otherwise. Reserves @num_fences on each GEM object 
>>>>>> after locking it.
>>>>>> + *
>>>>>> + * Returns: -EALREADY when object is already locked, -ENOMEM 
>>>>>> when memory
>>>>>> + * allocation failed and zero for success.
>>>>>> + */
>>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>>> +               struct drm_gem_object **objects,
>>>>>> +               unsigned int num_objects,
>>>>>> +               unsigned int num_fences)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    for (unsigned int i = 0; i < num_objects; ++i) {
>>>>>> +        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
>>>>>> +        if (ret)
>>>>>> +            return ret;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_exec_prepare_array);
>>>>>> +
>>>>>> +MODULE_DESCRIPTION("DRM execution context");
>>>>>> +MODULE_LICENSE("Dual MIT/GPL");
>>>>>> diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..b1a5da0509c1
>>>>>> --- /dev/null
>>>>>> +++ b/include/drm/drm_exec.h
>>>>>> @@ -0,0 +1,130 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
>>>>>> +
>>>>>> +#ifndef __DRM_EXEC_H__
>>>>>> +#define __DRM_EXEC_H__
>>>>>> +
>>>>>> +#include <linux/ww_mutex.h>
>>>>>> +
>>>>>> +struct drm_gem_object;
>>>>>> +
>>>>>> +/**
>>>>>> + * enum drm_exec_flags - Execution context flags
>>>>>> + */
>>>>>> +enum drm_exec_flags {
>>>>>> +    /**
>>>>>> +     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use 
>>>>>> interruptible locking
>>>>>> +     * functions.
>>>>>> +     */
>>>>>> +    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
>>>>>> +
>>>>>> +    /**
>>>>>> +     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow 
>>>>>> EALREADY errors.
>>>>>> +     */
>>>>>> +    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * struct drm_exec - Execution context
>>>>>> + */
>>>>>> +struct drm_exec {
>>>>>> +    /**
>>>>>> +     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
>>>>>> +     */
>>>>>> +    u32 flags;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @ticket: WW ticket used for acquiring locks
>>>>>> +     */
>>>>>> +    struct ww_acquire_ctx    ticket;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @num_objects: number of objects locked
>>>>>> +     */
>>>>>> +    unsigned int        num_objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @max_objects: maximum objects in array
>>>>>> +     */
>>>>>> +    unsigned int        max_objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @objects: array of the locked objects
>>>>>> +     */
>>>>>> +    struct drm_gem_object    **objects;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @contended: contended GEM object we backed off for
>>>>>> +     */
>>>>>> +    struct drm_gem_object    *contended;
>>>>>> +
>>>>>> +    /**
>>>>>> +     * @prelocked: already locked GEM object due to contention
>>>>>> +     */
>>>>>> +    struct drm_gem_object *prelocked;
>>>>>> +};
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_for_each_locked_object - iterate over all the locked 
>>>>>> objects
>>>>>> + * @exec: drm_exec object
>>>>>> + * @index: unsigned long index for the iteration
>>>>>> + * @obj: the current GEM object
>>>>>> + *
>>>>>> + * Iterate over all the locked GEM objects inside the drm_exec 
>>>>>> object.
>>>>>> + */
>>>>>> +#define drm_exec_for_each_locked_object(exec, index, obj) \
>>>>>> +    for (index = 0, obj = (exec)->objects[0]; \
>>>>>> +         index < (exec)->num_objects; \
>>>>>> +         ++index, obj = (exec)->objects[index])
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_exec_until_all_locked - retry objects preparation until 
>>>>>> all objects
>>>>>> + * are locked
>>>>>> + * @exec: drm_exec object
>>>>>> + * @expr: expression to be evaluated on each attempt
>>>>>> + *
>>>>>> + * This helper tries to prepare objects and if a deadlock is 
>>>>>> detected,
>>>>>> + * rollbacks and retries.
>>>>>> + *
>>>>>> + * @expr is typically a function that tries to prepare objects 
>>>>>> using
>>>>>> + * drm_exec_prepare_obj().
>>>>>> + *
>>>>>> + * If we take drm_exec_prepare_array() as an example, you should 
>>>>>> do:
>>>>>> + *
>>>>>> + *    ret = drm_exec_until_all_locked(exec,
>>>>>> + *                    drm_exec_prepare_array(exec,
>>>>>> + *                                   objs,
>>>>>> + *                                   num_objs,
>>>>>> + *                                   num_fences));
>>>>>> + *    if (ret)
>>>>>> + *        goto error_path;
>>>>>> + *
>>>>>> + *    ...
>>>>>> + *
>>>>>> + * Returns: 0 on success, a negative error code on failure.
>>>>>> + */
>>>>>> +#define drm_exec_until_all_locked(exec, expr)        \
>>>>>> +    ({                        \
>>>>>> +        __label__ retry;            \
>>>>>> +        int __ret;                \
>>>>>> +retry:                            \
>>>>>> +        __ret = expr;                \
>>>>>> +        if ((exec)->contended) {        \
>>>>>> +            WARN_ON(__ret != -EDEADLK);    \
>>>>>> +            drm_exec_reset(exec);        \
>>>>>> +            goto retry;            \
>>>>>> +        }                    \
>>>>>> +        ww_acquire_done(&(exec)->ticket);    \
>>>>>> +        __ret;                    \
>>>>>> +    })
>>>>>> +
>>>>>> +void drm_exec_init(struct drm_exec *exec, u32 flags);
>>>>>> +void drm_exec_fini(struct drm_exec *exec);
>>>>>> +void drm_exec_reset(struct drm_exec *exec);
>>>>>> +int drm_exec_prepare_obj(struct drm_exec *exec, struct 
>>>>>> drm_gem_object *obj,
>>>>>> +             unsigned int num_fences);
>>>>>> +int drm_exec_prepare_array(struct drm_exec *exec,
>>>>>> +               struct drm_gem_object **objects,
>>>>>> +               unsigned int num_objects,
>>>>>> +               unsigned int num_fences);
>>>>>> +
>>>>>> +#endif


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-06-21 13:35 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 11:51 Common DRM execution context v4 Christian König
2023-05-04 11:51 ` [PATCH 01/13] drm: execution context for GEM buffers v4 Christian König
2023-05-04 14:02   ` Thomas Hellström (Intel)
2023-05-25 20:42   ` Danilo Krummrich
2023-06-14 12:23   ` Boris Brezillon
2023-06-14 12:30     ` Christian König
2023-06-14 13:02       ` Boris Brezillon
2023-06-17 11:54         ` Boris Brezillon
2023-06-19  8:59           ` Thomas Hellström (Intel)
2023-06-19  9:20             ` Christian König
2023-06-19  9:33               ` Thomas Hellström (Intel)
2023-06-19  9:48                 ` Christian König
2023-06-19 11:06                   ` Thomas Hellström (Intel)
2023-06-21 13:35                     ` Christian König
2023-06-19 10:23               ` Boris Brezillon
2023-06-19 10:12             ` Boris Brezillon
2023-06-19 10:44               ` Christian König
2023-06-19 11:05                 ` Boris Brezillon
2023-06-19 12:01                   ` Boris Brezillon
2023-06-19 12:29                 ` Boris Brezillon
2023-06-20  6:47                   ` Boris Brezillon
2023-06-20  7:28                     ` Christian König
2023-05-04 11:51 ` [PATCH 02/13] drm: add drm_exec selftests v2 Christian König
2023-05-04 12:07   ` Maíra Canal
2023-05-04 12:52     ` Christian König
2023-05-04 11:51 ` [PATCH 03/13] drm/amdkfd: switch over to using drm_exec v2 Christian König
2023-05-04 11:51 ` [PATCH 04/13] drm/amdgpu: use drm_exec for GEM and CSA handling Christian König
2023-05-04 11:51 ` [PATCH 05/13] drm/amdgpu: use drm_exec for MES testing Christian König
2023-05-04 11:51 ` [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2 Christian König
2023-06-12 13:16   ` Tatsuyuki Ishi
2023-06-20  4:07   ` Tatsuyuki Ishi
2023-06-20  4:14     ` Tatsuyuki Ishi
2023-06-20  8:13       ` Christian König
2023-06-20  8:12     ` Christian König
2023-06-20  8:16       ` Tatsuyuki Ishi
2023-06-20  9:04         ` Tatsuyuki Ishi
2023-06-20  8:28       ` Boris Brezillon
2023-06-20  8:44         ` Christian König
2023-06-20  9:09           ` Boris Brezillon
2023-06-20  9:14             ` Christian König
2023-06-20  9:20               ` Boris Brezillon
2023-05-04 11:51 ` [PATCH 07/13] drm/radeon: switch over to drm_exec Christian König
2023-05-04 11:51 ` [PATCH 08/13] drm/qxl: switch to using drm_exec Christian König
2023-06-20  9:13   ` Thomas Zimmermann
2023-06-20  9:15     ` Christian König
2023-05-04 11:51 ` [PATCH 09/13] drm/lima: " Christian König
2023-05-04 11:51 ` [PATCH 10/13] drm/virtgpu: " Christian König
2023-05-04 11:51 ` [PATCH 11/13] drm/panfrost: " Christian König
2023-05-04 11:51 ` [PATCH 12/13] drm/v3d: " Christian König
2023-05-04 11:51 ` [PATCH 13/13] drm: remove drm_gem_(un)lock_reservations Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.