All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] replacing dma_resv API
@ 2019-08-21 12:31 ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Hi everyone,

In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.

This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.

So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.

The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.

To do this I came up with a new container called dma_resv_fences which can store both a single fence as well as multiple fences in a dma_fence_array.

This turned out to actually be even be quite a bit simpler, since we don't need any complicated dance between RCU and sequence count protected updates any more.

Instead we can just grab a reference to the dma_fence_array under RCU and so keep the current state of synchronization alive until we are done with it.

This results in both a small performance improvement since we don't need so many barriers any more, as well as fewer lines of code in the actual implementation.

Please review and/or comment,
Christian. 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC] replacing dma_resv API
@ 2019-08-21 12:31 ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Hi everyone,

In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.

This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.

So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.

The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.

To do this I came up with a new container called dma_resv_fences which can store both a single fence as well as multiple fences in a dma_fence_array.

This turned out to actually be even be quite a bit simpler, since we don't need any complicated dance between RCU and sequence count protected updates any more.

Instead we can just grab a reference to the dma_fence_array under RCU and so keep the current state of synchronization alive until we are done with it.

This results in both a small performance improvement since we don't need so many barriers any more, as well as fewer lines of code in the actual implementation.

Please review and/or comment,
Christian. 


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/10] dma-buf: make to_dma_fence_array NULL safe
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

A bit surprising that this wasn't already the case.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 include/linux/dma-fence-array.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 303dd712220f..f99cd7eb24e0 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -68,7 +68,7 @@ static inline bool dma_fence_is_array(struct dma_fence *fence)
 static inline struct dma_fence_array *
 to_dma_fence_array(struct dma_fence *fence)
 {
-	if (fence->ops != &dma_fence_array_ops)
+	if (!fence || fence->ops != &dma_fence_array_ops)
 		return NULL;
 
 	return container_of(fence, struct dma_fence_array, base);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 01/10] dma-buf: make to_dma_fence_array NULL safe
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

A bit surprising that this wasn't already the case.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 include/linux/dma-fence-array.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 303dd712220f..f99cd7eb24e0 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -68,7 +68,7 @@ static inline bool dma_fence_is_array(struct dma_fence *fence)
 static inline struct dma_fence_array *
 to_dma_fence_array(struct dma_fence *fence)
 {
-	if (fence->ops != &dma_fence_array_ops)
+	if (!fence || fence->ops != &dma_fence_array_ops)
 		return NULL;
 
 	return container_of(fence, struct dma_fence_array, base);
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/10] dma-buf: add dma_fence_array_alloc/free
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

This seperates allocation and initialization of the dma_fence_array object
and allows allocating the fence array together with the dma_fence_array.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 101 ++++++++++++++++++++----------
 include/linux/dma-fence-array.h   |  50 +++++++++++++--
 2 files changed, 115 insertions(+), 36 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be94..e82f30551aa1 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -109,14 +109,7 @@ static bool dma_fence_array_signaled(struct dma_fence *fence)
 
 static void dma_fence_array_release(struct dma_fence *fence)
 {
-	struct dma_fence_array *array = to_dma_fence_array(fence);
-	unsigned i;
-
-	for (i = 0; i < array->num_fences; ++i)
-		dma_fence_put(array->fences[i]);
-
-	kfree(array->fences);
-	dma_fence_free(fence);
+	dma_fence_array_free(container_of(fence, struct dma_fence_array, base));
 }
 
 const struct dma_fence_ops dma_fence_array_ops = {
@@ -129,52 +122,96 @@ const struct dma_fence_ops dma_fence_array_ops = {
 EXPORT_SYMBOL(dma_fence_array_ops);
 
 /**
- * dma_fence_array_create - Create a custom fence array
+ * dma_fence_array_alloc - Allocate a custom fence array
  * @num_fences:		[in]	number of fences to add in the array
- * @fences:		[in]	array containing the fences
- * @context:		[in]	fence context to use
- * @seqno:		[in]	sequence number to use
- * @signal_on_any:	[in]	signal on any fence in the array
+ * @fences:		[in]	optional array containing the fences
  *
- * Allocate a dma_fence_array object and initialize the base fence with
- * dma_fence_init().
- * In case of error it returns NULL.
+ * Allocate a dma_fence_array object, in case of error it returns NULL.
  *
- * The caller should allocate the fences array with num_fences size
- * and fill it with the fences it wants to add to the object. Ownership of this
- * array is taken and dma_fence_put() is used on each fence on release.
- *
- * If @signal_on_any is true the fence array signals if any fence in the array
- * signals, otherwise it signals when all fences in the array signal.
+ * The fences array is optional and if NULL allocated together with the
+ * dma_fence_array object.
  */
-struct dma_fence_array *dma_fence_array_create(int num_fences,
-					       struct dma_fence **fences,
-					       u64 context, unsigned seqno,
-					       bool signal_on_any)
+struct dma_fence_array *dma_fence_array_alloc(int num_fences,
+					      struct dma_fence **fences)
 {
 	struct dma_fence_array *array;
 	size_t size = sizeof(*array);
 
 	/* Allocate the callback structures behind the array. */
 	size += num_fences * sizeof(struct dma_fence_array_cb);
+
+	/* Allocate the fences structures behind the callbacks */
+	if (!fences)
+		size += num_fences * sizeof(struct dma_fence *);
+
 	array = kzalloc(size, GFP_KERNEL);
 	if (!array)
 		return NULL;
 
+	if (!fences) {
+		struct dma_fence_array_cb *cb = (void *)(&array[1]);
+
+		num_fences = dma_fence_array_max_fences(array);
+		fences = (void *)(&cb[num_fences]);
+	}
+	array->fences = fences;
+	return array;
+}
+EXPORT_SYMBOL(dma_fence_array_alloc);
+
+/**
+ * dma_fence_array_init - init a custom fence array
+ * @array:		[in]	the pre-allocated array to init
+ * @context:		[in]	fence context to use
+ * @seqno:		[in]	sequence number to use
+ * @signal_on_any:	[in]	signal on any fence in the array
+ *
+ * Initialize the base fence with dma_fence_init().
+ *
+ * The caller should allocate the fences array with num_fences size
+ * and fill it with the fences it wants to add to the object. Ownership of this
+ * array is taken and dma_fence_put() is used on each fence on release.
+ *
+ * If @signal_on_any is true the fence array signals if any fence in the array
+ * signals, otherwise it signals when all fences in the array signal.
+ */
+void dma_fence_array_init(struct dma_fence_array *array, u64 context,
+			  unsigned int seqno, bool signal_on_any)
+{
 	spin_lock_init(&array->lock);
 	dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
 		       context, seqno);
 	init_irq_work(&array->work, irq_dma_fence_array_work);
 
-	array->num_fences = num_fences;
-	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
-	array->fences = fences;
-
+	atomic_set(&array->num_pending, signal_on_any ? 1 : array->num_fences);
 	array->base.error = PENDING_ERROR;
+}
+EXPORT_SYMBOL(dma_fence_array_init);
 
-	return array;
+/**
+ * dma_fence_array_free - free a dma_fence_array object
+ * @array: the object to free
+ *
+ * The a dma_fence_array and drop all references to the fences it contains.
+ */
+void dma_fence_array_free(struct dma_fence_array *array)
+{
+	unsigned i;
+
+	if (!array)
+		return;
+
+	for (i = 0; i < array->num_fences; ++i)
+		dma_fence_put(array->fences[i]);
+
+	/* Check if fences are part of the array */
+	if ((u8 *)array->fences < (u8 *)array ||
+	    (u8 *)array->fences > ((u8 *)array) + ksize(array))
+		kfree(array->fences);
+
+	dma_fence_free(&array->base);
 }
-EXPORT_SYMBOL(dma_fence_array_create);
+EXPORT_SYMBOL(dma_fence_array_free);
 
 /**
  * dma_fence_match_context - Check if all fences are from the given context
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index f99cd7eb24e0..be85c06b524d 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -14,6 +14,7 @@
 
 #include <linux/dma-fence.h>
 #include <linux/irq_work.h>
+#include <linux/slab.h>
 
 /**
  * struct dma_fence_array_cb - callback helper for fence array
@@ -74,10 +75,51 @@ to_dma_fence_array(struct dma_fence *fence)
 	return container_of(fence, struct dma_fence_array, base);
 }
 
-struct dma_fence_array *dma_fence_array_create(int num_fences,
-					       struct dma_fence **fences,
-					       u64 context, unsigned seqno,
-					       bool signal_on_any);
+/**
+ * dma_fence_array_max_fences - calculate maximum number of fences
+ * @array:	[in]	the dma_fence_array to use
+ *
+ * Returns the maximum number of fences a dma_fence_array can store.
+ */
+static inline unsigned int
+dma_fence_array_max_fences(struct dma_fence_array *array)
+{
+	return (ksize(array) - sizeof(*array)) /
+		(sizeof(struct dma_fence_array_cb) +
+		 sizeof(struct dma_fence *));
+}
+
+struct dma_fence_array *dma_fence_array_alloc(int num_fences,
+					      struct dma_fence **fences);
+void dma_fence_array_init(struct dma_fence_array *array, u64 context,
+			  unsigned int seqno, bool signal_on_any);
+void dma_fence_array_free(struct dma_fence_array *array);
+
+/**
+ * dma_fence_array_create - Create a custom fence array
+ * @num_fences:		[in]	number of fences to add in the array
+ * @fences:		[in]	array containing the fences
+ * @context:		[in]	fence context to use
+ * @seqno:		[in]	sequence number to use
+ * @signal_on_any:	[in]	signal on any fence in the array
+ *
+ * Wrapper around dma_fence_array_alloc and dma_fence_array_init. Returns NULL
+ * on allocation failure.
+ */
+static inline struct dma_fence_array *
+dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context,
+		       unsigned seqno, bool signal_on_any)
+{
+	struct dma_fence_array *array;
+
+	array = dma_fence_array_alloc(num_fences, fences);
+	if (!array)
+		return NULL;
+
+	array->num_fences = num_fences;
+	dma_fence_array_init(array, context, seqno, signal_on_any);
+	return array;
+}
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/10] dma-buf: add dma_fence_array_alloc/free
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

This seperates allocation and initialization of the dma_fence_array object
and allows allocating the fence array together with the dma_fence_array.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 101 ++++++++++++++++++++----------
 include/linux/dma-fence-array.h   |  50 +++++++++++++--
 2 files changed, 115 insertions(+), 36 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be94..e82f30551aa1 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -109,14 +109,7 @@ static bool dma_fence_array_signaled(struct dma_fence *fence)
 
 static void dma_fence_array_release(struct dma_fence *fence)
 {
-	struct dma_fence_array *array = to_dma_fence_array(fence);
-	unsigned i;
-
-	for (i = 0; i < array->num_fences; ++i)
-		dma_fence_put(array->fences[i]);
-
-	kfree(array->fences);
-	dma_fence_free(fence);
+	dma_fence_array_free(container_of(fence, struct dma_fence_array, base));
 }
 
 const struct dma_fence_ops dma_fence_array_ops = {
@@ -129,52 +122,96 @@ const struct dma_fence_ops dma_fence_array_ops = {
 EXPORT_SYMBOL(dma_fence_array_ops);
 
 /**
- * dma_fence_array_create - Create a custom fence array
+ * dma_fence_array_alloc - Allocate a custom fence array
  * @num_fences:		[in]	number of fences to add in the array
- * @fences:		[in]	array containing the fences
- * @context:		[in]	fence context to use
- * @seqno:		[in]	sequence number to use
- * @signal_on_any:	[in]	signal on any fence in the array
+ * @fences:		[in]	optional array containing the fences
  *
- * Allocate a dma_fence_array object and initialize the base fence with
- * dma_fence_init().
- * In case of error it returns NULL.
+ * Allocate a dma_fence_array object, in case of error it returns NULL.
  *
- * The caller should allocate the fences array with num_fences size
- * and fill it with the fences it wants to add to the object. Ownership of this
- * array is taken and dma_fence_put() is used on each fence on release.
- *
- * If @signal_on_any is true the fence array signals if any fence in the array
- * signals, otherwise it signals when all fences in the array signal.
+ * The fences array is optional and if NULL allocated together with the
+ * dma_fence_array object.
  */
-struct dma_fence_array *dma_fence_array_create(int num_fences,
-					       struct dma_fence **fences,
-					       u64 context, unsigned seqno,
-					       bool signal_on_any)
+struct dma_fence_array *dma_fence_array_alloc(int num_fences,
+					      struct dma_fence **fences)
 {
 	struct dma_fence_array *array;
 	size_t size = sizeof(*array);
 
 	/* Allocate the callback structures behind the array. */
 	size += num_fences * sizeof(struct dma_fence_array_cb);
+
+	/* Allocate the fences structures behind the callbacks */
+	if (!fences)
+		size += num_fences * sizeof(struct dma_fence *);
+
 	array = kzalloc(size, GFP_KERNEL);
 	if (!array)
 		return NULL;
 
+	if (!fences) {
+		struct dma_fence_array_cb *cb = (void *)(&array[1]);
+
+		num_fences = dma_fence_array_max_fences(array);
+		fences = (void *)(&cb[num_fences]);
+	}
+	array->fences = fences;
+	return array;
+}
+EXPORT_SYMBOL(dma_fence_array_alloc);
+
+/**
+ * dma_fence_array_init - init a custom fence array
+ * @array:		[in]	the pre-allocated array to init
+ * @context:		[in]	fence context to use
+ * @seqno:		[in]	sequence number to use
+ * @signal_on_any:	[in]	signal on any fence in the array
+ *
+ * Initialize the base fence with dma_fence_init().
+ *
+ * The caller should allocate the fences array with num_fences size
+ * and fill it with the fences it wants to add to the object. Ownership of this
+ * array is taken and dma_fence_put() is used on each fence on release.
+ *
+ * If @signal_on_any is true the fence array signals if any fence in the array
+ * signals, otherwise it signals when all fences in the array signal.
+ */
+void dma_fence_array_init(struct dma_fence_array *array, u64 context,
+			  unsigned int seqno, bool signal_on_any)
+{
 	spin_lock_init(&array->lock);
 	dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
 		       context, seqno);
 	init_irq_work(&array->work, irq_dma_fence_array_work);
 
-	array->num_fences = num_fences;
-	atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
-	array->fences = fences;
-
+	atomic_set(&array->num_pending, signal_on_any ? 1 : array->num_fences);
 	array->base.error = PENDING_ERROR;
+}
+EXPORT_SYMBOL(dma_fence_array_init);
 
-	return array;
+/**
+ * dma_fence_array_free - free a dma_fence_array object
+ * @array: the object to free
+ *
+ * The a dma_fence_array and drop all references to the fences it contains.
+ */
+void dma_fence_array_free(struct dma_fence_array *array)
+{
+	unsigned i;
+
+	if (!array)
+		return;
+
+	for (i = 0; i < array->num_fences; ++i)
+		dma_fence_put(array->fences[i]);
+
+	/* Check if fences are part of the array */
+	if ((u8 *)array->fences < (u8 *)array ||
+	    (u8 *)array->fences > ((u8 *)array) + ksize(array))
+		kfree(array->fences);
+
+	dma_fence_free(&array->base);
 }
-EXPORT_SYMBOL(dma_fence_array_create);
+EXPORT_SYMBOL(dma_fence_array_free);
 
 /**
  * dma_fence_match_context - Check if all fences are from the given context
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index f99cd7eb24e0..be85c06b524d 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -14,6 +14,7 @@
 
 #include <linux/dma-fence.h>
 #include <linux/irq_work.h>
+#include <linux/slab.h>
 
 /**
  * struct dma_fence_array_cb - callback helper for fence array
@@ -74,10 +75,51 @@ to_dma_fence_array(struct dma_fence *fence)
 	return container_of(fence, struct dma_fence_array, base);
 }
 
-struct dma_fence_array *dma_fence_array_create(int num_fences,
-					       struct dma_fence **fences,
-					       u64 context, unsigned seqno,
-					       bool signal_on_any);
+/**
+ * dma_fence_array_max_fences - calculate maximum number of fences
+ * @array:	[in]	the dma_fence_array to use
+ *
+ * Returns the maximum number of fences a dma_fence_array can store.
+ */
+static inline unsigned int
+dma_fence_array_max_fences(struct dma_fence_array *array)
+{
+	return (ksize(array) - sizeof(*array)) /
+		(sizeof(struct dma_fence_array_cb) +
+		 sizeof(struct dma_fence *));
+}
+
+struct dma_fence_array *dma_fence_array_alloc(int num_fences,
+					      struct dma_fence **fences);
+void dma_fence_array_init(struct dma_fence_array *array, u64 context,
+			  unsigned int seqno, bool signal_on_any);
+void dma_fence_array_free(struct dma_fence_array *array);
+
+/**
+ * dma_fence_array_create - Create a custom fence array
+ * @num_fences:		[in]	number of fences to add in the array
+ * @fences:		[in]	array containing the fences
+ * @context:		[in]	fence context to use
+ * @seqno:		[in]	sequence number to use
+ * @signal_on_any:	[in]	signal on any fence in the array
+ *
+ * Wrapper around dma_fence_array_alloc and dma_fence_array_init. Returns NULL
+ * on allocation failure.
+ */
+static inline struct dma_fence_array *
+dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context,
+		       unsigned seqno, bool signal_on_any)
+{
+	struct dma_fence_array *array;
+
+	array = dma_fence_array_alloc(num_fences, fences);
+	if (!array)
+		return NULL;
+
+	array->num_fences = num_fences;
+	dma_fence_array_init(array, context, seqno, signal_on_any);
+	return array;
+}
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/10] dma-buf: add dma_fence_array_recycle
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Try to recycle an dma_fence_array object by dropping the last
reference to it without freeing it.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
 include/linux/dma-fence-array.h   |  1 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index e82f30551aa1..314cf0e881d9 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
 }
 EXPORT_SYMBOL(dma_fence_array_init);
 
+/**
+ * dma_fence_array_reuse - dummy for dma_fence_array_recycle
+ */
+static void dma_fence_array_reuse(struct kref *kref)
+{
+	struct dma_fence_array *array = container_of(kref, typeof(*array),
+						     base.refcount);
+
+	WARN_ON_ONCE(!list_empty(&array->base.cb_list));
+}
+
+/**
+ * dma_fence_array_try_reuse - try to reuse a dma_fence_array object
+ * @array: array which we should try to reuse
+ *
+ * Try to drop the last reference to an dma_fence_array and so allow it to be
+ * reused.
+ *
+ * Returns true if this was the last reference then caller can reuse the array.
+ * In this case the array is reset into a state where it can be used with
+ * dma_fence_array_init again.
+ */
+bool dma_fence_array_recycle(struct dma_fence_array *array)
+{
+	return kref_put(&array->base.refcount, dma_fence_array_reuse);
+}
+
 /**
  * dma_fence_array_free - free a dma_fence_array object
  * @array: the object to free
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index be85c06b524d..35d1d1e7c93b 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -93,6 +93,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences,
 					      struct dma_fence **fences);
 void dma_fence_array_init(struct dma_fence_array *array, u64 context,
 			  unsigned int seqno, bool signal_on_any);
+bool dma_fence_array_recycle(struct dma_fence_array *array);
 void dma_fence_array_free(struct dma_fence_array *array);
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/10] dma-buf: add dma_fence_array_recycle
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Try to recycle an dma_fence_array object by dropping the last
reference to it without freeing it.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
 include/linux/dma-fence-array.h   |  1 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index e82f30551aa1..314cf0e881d9 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
 }
 EXPORT_SYMBOL(dma_fence_array_init);
 
+/**
+ * dma_fence_array_reuse - dummy for dma_fence_array_recycle
+ */
+static void dma_fence_array_reuse(struct kref *kref)
+{
+	struct dma_fence_array *array = container_of(kref, typeof(*array),
+						     base.refcount);
+
+	WARN_ON_ONCE(!list_empty(&array->base.cb_list));
+}
+
+/**
+ * dma_fence_array_try_reuse - try to reuse a dma_fence_array object
+ * @array: array which we should try to reuse
+ *
+ * Try to drop the last reference to an dma_fence_array and so allow it to be
+ * reused.
+ *
+ * Returns true if this was the last reference then caller can reuse the array.
+ * In this case the array is reset into a state where it can be used with
+ * dma_fence_array_init again.
+ */
+bool dma_fence_array_recycle(struct dma_fence_array *array)
+{
+	return kref_put(&array->base.refcount, dma_fence_array_reuse);
+}
+
 /**
  * dma_fence_array_free - free a dma_fence_array object
  * @array: the object to free
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index be85c06b524d..35d1d1e7c93b 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -93,6 +93,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences,
 					      struct dma_fence **fences);
 void dma_fence_array_init(struct dma_fence_array *array, u64 context,
 			  unsigned int seqno, bool signal_on_any);
+bool dma_fence_array_recycle(struct dma_fence_array *array);
 void dma_fence_array_free(struct dma_fence_array *array);
 
 /**
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/10] dma-buf: add dma_fence_array_for_each
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Makes it easier to extract the fences in a dma_fence_array.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 42 +++++++++++++++++++++++++++++++
 include/linux/dma-fence-array.h   | 24 ++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 314cf0e881d9..887e32e89269 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -265,3 +265,45 @@ bool dma_fence_match_context(struct dma_fence *fence, u64 context)
 	return true;
 }
 EXPORT_SYMBOL(dma_fence_match_context);
+
+/**
+ * dma_fence_array_first - return the first fence in an array
+ * @cursor: cursor for the current position
+ * @array: array with the fences
+ *
+ * Returns the first fence in the array or NULL if the array is empty.
+ * If the array parameter isn't a dma_fence_array return it unmodified.
+ */
+struct dma_fence *dma_fence_array_first(struct dma_fence_array_cursor *cursor,
+					struct dma_fence *array)
+{
+	cursor->array = to_dma_fence_array(array);
+	if (!cursor->array)
+		return array;
+
+	if (!cursor->array->num_fences)
+		return NULL;
+
+	cursor->index = 0;
+	return cursor->array->fences[cursor->index++];
+
+}
+EXPORT_SYMBOL(dma_fence_array_first);
+
+/**
+ * dma_fence_array_next - return the next fence in the array
+ * @cursor: cursor for the current position
+ *
+ * Returns the next fence from the array or NULL if we don't have any more
+ */
+struct dma_fence *dma_fence_array_next(struct dma_fence_array_cursor *cursor)
+{
+	if (!cursor->array)
+		return NULL;
+
+	if (cursor->index >= cursor->array->num_fences)
+		return NULL;
+
+	return cursor->array->fences[cursor->index++];
+}
+EXPORT_SYMBOL(dma_fence_array_next);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 35d1d1e7c93b..2a71fd003dfa 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -124,4 +124,28 @@ dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context,
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
+/**
+ * struct dma_fence_array_cursor - cursor for the fences in an array
+ */
+struct dma_fence_array_cursor {
+	struct dma_fence_array *array;
+	unsigned int index;
+};
+
+struct dma_fence *dma_fence_array_first(struct dma_fence_array_cursor *cursor,
+					struct dma_fence *array);
+struct dma_fence *dma_fence_array_next(struct dma_fence_array_cursor *cursor);
+
+/**
+ * dma_fence_array_for_each - walk over all fences in a fence array
+ * @fence: the current fence
+ * @cursor: cursor for the walk
+ * @array: potentitally fence array
+ *
+ * Walk over all the fences in the array.
+ */
+#define dma_fence_array_for_each(fence, cursor, array)		\
+	for (fence = dma_fence_array_first(&(cursor), array);	\
+	     fence; fence = dma_fence_array_next(&(cursor)))
+
 #endif /* __LINUX_DMA_FENCE_ARRAY_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/10] dma-buf: add dma_fence_array_for_each
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Makes it easier to extract the fences in a dma_fence_array.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-array.c | 42 +++++++++++++++++++++++++++++++
 include/linux/dma-fence-array.h   | 24 ++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
index 314cf0e881d9..887e32e89269 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -265,3 +265,45 @@ bool dma_fence_match_context(struct dma_fence *fence, u64 context)
 	return true;
 }
 EXPORT_SYMBOL(dma_fence_match_context);
+
+/**
+ * dma_fence_array_first - return the first fence in an array
+ * @cursor: cursor for the current position
+ * @array: array with the fences
+ *
+ * Returns the first fence in the array or NULL if the array is empty.
+ * If the array parameter isn't a dma_fence_array return it unmodified.
+ */
+struct dma_fence *dma_fence_array_first(struct dma_fence_array_cursor *cursor,
+					struct dma_fence *array)
+{
+	cursor->array = to_dma_fence_array(array);
+	if (!cursor->array)
+		return array;
+
+	if (!cursor->array->num_fences)
+		return NULL;
+
+	cursor->index = 0;
+	return cursor->array->fences[cursor->index++];
+
+}
+EXPORT_SYMBOL(dma_fence_array_first);
+
+/**
+ * dma_fence_array_next - return the next fence in the array
+ * @cursor: cursor for the current position
+ *
+ * Returns the next fence from the array or NULL if we don't have any more
+ */
+struct dma_fence *dma_fence_array_next(struct dma_fence_array_cursor *cursor)
+{
+	if (!cursor->array)
+		return NULL;
+
+	if (cursor->index >= cursor->array->num_fences)
+		return NULL;
+
+	return cursor->array->fences[cursor->index++];
+}
+EXPORT_SYMBOL(dma_fence_array_next);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 35d1d1e7c93b..2a71fd003dfa 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -124,4 +124,28 @@ dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context,
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
+/**
+ * struct dma_fence_array_cursor - cursor for the fences in an array
+ */
+struct dma_fence_array_cursor {
+	struct dma_fence_array *array;
+	unsigned int index;
+};
+
+struct dma_fence *dma_fence_array_first(struct dma_fence_array_cursor *cursor,
+					struct dma_fence *array);
+struct dma_fence *dma_fence_array_next(struct dma_fence_array_cursor *cursor);
+
+/**
+ * dma_fence_array_for_each - walk over all fences in a fence array
+ * @fence: the current fence
+ * @cursor: cursor for the walk
+ * @array: potentitally fence array
+ *
+ * Walk over all the fences in the array.
+ */
+#define dma_fence_array_for_each(fence, cursor, array)		\
+	for (fence = dma_fence_array_first(&(cursor), array);	\
+	     fence; fence = dma_fence_array_next(&(cursor)))
+
 #endif /* __LINUX_DMA_FENCE_ARRAY_H */
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Add a new dma_resv_prune_fences() function to improve memory management.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
 drivers/gpu/drm/i915/i915_vma.c            |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
 include/linux/dma-resv.h                   |  1 +
 6 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 42a8f3f11681..24adc32d36d4 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
 
+/**
+ * dma_resv_prune_fences - prune signaled fences from the resv object
+ * @obj: the reservation object to prune
+ *
+ * Prune all signaled fences from the reservation object.
+ */
+void dma_resv_prune_fences(struct dma_resv *obj)
+{
+	struct dma_resv_list *list;
+	struct dma_fence *fence;
+	unsigned int i;
+
+	dma_resv_assert_held(obj);
+
+	fence = dma_resv_get_excl(obj);
+	if (dma_fence_is_signaled(fence)) {
+		RCU_INIT_POINTER(obj->fence_excl, NULL);
+		dma_fence_put(fence);
+	}
+
+	list = dma_resv_get_list(obj);
+	if (!list)
+		return;
+
+	for (i = 0; i < list->shared_count; ++i) {
+		fence = rcu_dereference_protected(list->shared[i],
+						  dma_resv_held(obj));
+
+		if (!dma_fence_is_signaled(fence))
+			continue;
+
+		RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
+		dma_fence_put(fence);
+	}
+}
+EXPORT_SYMBOL(dma_resv_prune_fences);
+
 /**
 * dma_resv_copy_fences - Copy all fences from src to dst.
 * @dst: the destination reservation object
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 8af55cd3e690..a76d36f8fb77 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -85,8 +85,7 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
 	 * signaled.
 	 */
 	if (prune_fences && dma_resv_trylock(resv)) {
-		if (dma_resv_test_signaled_rcu(resv, true))
-			dma_resv_add_excl_fence(resv, NULL);
+		dma_resv_prune_fences(resv);
 		dma_resv_unlock(resv);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 5f82a763e64c..274cf5b19fc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -114,7 +114,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			 */
 			if (rcu_access_pointer(resv->fence)) {
 				dma_resv_lock(resv, NULL);
-				dma_resv_add_excl_fence(resv, NULL);
+				dma_resv_prune_fences(resv);
 				dma_resv_unlock(resv);
 			}
 		}
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ebfd03d117cd..fcbe433a968c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -100,8 +100,7 @@ static void __i915_vma_retire(struct i915_active *ref)
 
 	/* Prune the shared fence arrays iff completely idle (inc. external) */
 	if (dma_resv_trylock(obj->base.resv)) {
-		if (dma_resv_test_signaled_rcu(obj->base.resv, true))
-			dma_resv_add_excl_fence(obj->base.resv, NULL);
+		dma_resv_prune_fences(obj->base.resv);
 		dma_resv_unlock(obj->base.resv);
 	}
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 58d1f2b28132..f78f52cc2e6d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1818,7 +1818,7 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
 	if (timeout == 0)
 		return -EBUSY;
 
-	dma_resv_add_excl_fence(bo->base.resv, NULL);
+	dma_resv_prune_fences(bo->base.resv);
 	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_wait);
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..03b0f95682b0 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -279,6 +279,7 @@ int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
 void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
 
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
+void dma_resv_prune_fences(struct dma_resv *obj);
 
 int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    struct dma_fence **pfence_excl,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Add a new dma_resv_prune_fences() function to improve memory management.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
 drivers/gpu/drm/i915/i915_vma.c            |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
 include/linux/dma-resv.h                   |  1 +
 6 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 42a8f3f11681..24adc32d36d4 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
 
+/**
+ * dma_resv_prune_fences - prune signaled fences from the resv object
+ * @obj: the reservation object to prune
+ *
+ * Prune all signaled fences from the reservation object.
+ */
+void dma_resv_prune_fences(struct dma_resv *obj)
+{
+	struct dma_resv_list *list;
+	struct dma_fence *fence;
+	unsigned int i;
+
+	dma_resv_assert_held(obj);
+
+	fence = dma_resv_get_excl(obj);
+	if (dma_fence_is_signaled(fence)) {
+		RCU_INIT_POINTER(obj->fence_excl, NULL);
+		dma_fence_put(fence);
+	}
+
+	list = dma_resv_get_list(obj);
+	if (!list)
+		return;
+
+	for (i = 0; i < list->shared_count; ++i) {
+		fence = rcu_dereference_protected(list->shared[i],
+						  dma_resv_held(obj));
+
+		if (!dma_fence_is_signaled(fence))
+			continue;
+
+		RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
+		dma_fence_put(fence);
+	}
+}
+EXPORT_SYMBOL(dma_resv_prune_fences);
+
 /**
 * dma_resv_copy_fences - Copy all fences from src to dst.
 * @dst: the destination reservation object
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 8af55cd3e690..a76d36f8fb77 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -85,8 +85,7 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
 	 * signaled.
 	 */
 	if (prune_fences && dma_resv_trylock(resv)) {
-		if (dma_resv_test_signaled_rcu(resv, true))
-			dma_resv_add_excl_fence(resv, NULL);
+		dma_resv_prune_fences(resv);
 		dma_resv_unlock(resv);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 5f82a763e64c..274cf5b19fc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -114,7 +114,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			 */
 			if (rcu_access_pointer(resv->fence)) {
 				dma_resv_lock(resv, NULL);
-				dma_resv_add_excl_fence(resv, NULL);
+				dma_resv_prune_fences(resv);
 				dma_resv_unlock(resv);
 			}
 		}
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ebfd03d117cd..fcbe433a968c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -100,8 +100,7 @@ static void __i915_vma_retire(struct i915_active *ref)
 
 	/* Prune the shared fence arrays iff completely idle (inc. external) */
 	if (dma_resv_trylock(obj->base.resv)) {
-		if (dma_resv_test_signaled_rcu(obj->base.resv, true))
-			dma_resv_add_excl_fence(obj->base.resv, NULL);
+		dma_resv_prune_fences(obj->base.resv);
 		dma_resv_unlock(obj->base.resv);
 	}
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 58d1f2b28132..f78f52cc2e6d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1818,7 +1818,7 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
 	if (timeout == 0)
 		return -EBUSY;
 
-	dma_resv_add_excl_fence(bo->base.resv, NULL);
+	dma_resv_prune_fences(bo->base.resv);
 	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_wait);
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..03b0f95682b0 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -279,6 +279,7 @@ int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
 void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
 
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
+void dma_resv_prune_fences(struct dma_resv *obj);
 
 int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    struct dma_fence **pfence_excl,
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/10] dma-buf/resv: stop pruning shared fences when exclusive is added
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

First step towards new shared fence container implementation.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c              | 16 +---------------
 drivers/gpu/drm/msm/msm_gem.c           | 14 ++++++--------
 drivers/gpu/drm/nouveau/nouveau_fence.c |  2 +-
 3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 24adc32d36d4..d3a9a3bb15f0 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -271,32 +271,18 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 {
 	struct dma_fence *old_fence = dma_resv_get_excl(obj);
-	struct dma_resv_list *old;
-	u32 i = 0;
 
 	dma_resv_assert_held(obj);
 
-	old = dma_resv_get_list(obj);
-	if (old)
-		i = old->shared_count;
-
-	if (fence)
-		dma_fence_get(fence);
+	dma_fence_get(fence);
 
 	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
-	if (old)
-		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
-	/* inplace update, no shared fences */
-	while (i--)
-		dma_fence_put(rcu_dereference_protected(old->shared[i],
-						dma_resv_held(obj)));
-
 	dma_fence_put(old_fence);
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 348a7ad2c044..90e3dc3b927a 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -668,14 +668,12 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
 	int i, ret;
 
 	fobj = dma_resv_get_list(obj->resv);
-	if (!fobj || (fobj->shared_count == 0)) {
-		fence = dma_resv_get_excl(obj->resv);
-		/* don't need to wait on our own fences, since ring is fifo */
-		if (fence && (fence->context != fctx->context)) {
-			ret = dma_fence_wait(fence, true);
-			if (ret)
-				return ret;
-		}
+	fence = dma_resv_get_excl(obj->resv);
+	/* don't need to wait on our own fences, since ring is fifo */
+	if (fence && (fence->context != fctx->context)) {
+		ret = dma_fence_wait(fence, true);
+		if (ret)
+			return ret;
 	}
 
 	if (!exclusive || !fobj)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 8df390078c85..42ddddbb49e4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -350,7 +350,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 	fobj = dma_resv_get_list(resv);
 	fence = dma_resv_get_excl(resv);
 
-	if (fence && (!exclusive || !fobj || !fobj->shared_count)) {
+	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/10] dma-buf/resv: stop pruning shared fences when exclusive is added
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

First step towards new shared fence container implementation.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c              | 16 +---------------
 drivers/gpu/drm/msm/msm_gem.c           | 14 ++++++--------
 drivers/gpu/drm/nouveau/nouveau_fence.c |  2 +-
 3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 24adc32d36d4..d3a9a3bb15f0 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -271,32 +271,18 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 {
 	struct dma_fence *old_fence = dma_resv_get_excl(obj);
-	struct dma_resv_list *old;
-	u32 i = 0;
 
 	dma_resv_assert_held(obj);
 
-	old = dma_resv_get_list(obj);
-	if (old)
-		i = old->shared_count;
-
-	if (fence)
-		dma_fence_get(fence);
+	dma_fence_get(fence);
 
 	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
-	if (old)
-		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
-	/* inplace update, no shared fences */
-	while (i--)
-		dma_fence_put(rcu_dereference_protected(old->shared[i],
-						dma_resv_held(obj)));
-
 	dma_fence_put(old_fence);
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 348a7ad2c044..90e3dc3b927a 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -668,14 +668,12 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
 	int i, ret;
 
 	fobj = dma_resv_get_list(obj->resv);
-	if (!fobj || (fobj->shared_count == 0)) {
-		fence = dma_resv_get_excl(obj->resv);
-		/* don't need to wait on our own fences, since ring is fifo */
-		if (fence && (fence->context != fctx->context)) {
-			ret = dma_fence_wait(fence, true);
-			if (ret)
-				return ret;
-		}
+	fence = dma_resv_get_excl(obj->resv);
+	/* don't need to wait on our own fences, since ring is fifo */
+	if (fence && (fence->context != fctx->context)) {
+		ret = dma_fence_wait(fence, true);
+		if (ret)
+			return ret;
 	}
 
 	if (!exclusive || !fobj)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 8df390078c85..42ddddbb49e4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -350,7 +350,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 	fobj = dma_resv_get_list(resv);
 	fence = dma_resv_get_excl(resv);
 
-	if (fence && (!exclusive || !fobj || !fobj->shared_count)) {
+	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/10] dma-buf/resv: add new fences container implementation
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Add a new container for fences which internally uses
dma_fence_array's to store the fences.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c | 221 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-resv.h   |  49 ++++++++
 2 files changed, 270 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index d3a9a3bb15f0..83033b3e8521 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -33,6 +33,7 @@
  */
 
 #include <linux/dma-resv.h>
+#include <linux/dma-fence-array.h>
 #include <linux/export.h>
 
 /**
@@ -55,6 +56,226 @@ EXPORT_SYMBOL(reservation_seqcount_class);
 const char reservation_seqcount_string[] = "reservation_seqcount";
 EXPORT_SYMBOL(reservation_seqcount_string);
 
+static void dma_resv_fences_init(struct dma_resv_fences *fences)
+{
+	RCU_INIT_POINTER(fences->fence, NULL);
+	fences->staged = NULL;
+}
+
+static void dma_resv_fences_fini(struct dma_resv_fences *fences)
+{
+	/*
+	 * This object should be dead and all references must have
+	 * been released to it, so no need to be protected with rcu.
+	 */
+	dma_fence_put(rcu_dereference_protected(fences->fence, true));
+	dma_fence_array_free(fences->staged);
+}
+
+/**
+ * dma_resv_fences_reserve - allocate fence slots
+ * @fences: fences object where we need slots
+ * @num_fences: number of fence slots we need
+ *
+ * Make sure that we have at least @num_fences + all the existing ones free
+ * slots in the staged dma_fence_array.
+ *
+ * Returns -ENOMEM on allocation failure, 0 otherwise.
+ */
+int dma_resv_fences_reserve(struct dma_resv *obj,
+			    struct dma_resv_fences *fences,
+			    unsigned int num_fences)
+{
+	struct dma_fence *fence = dma_resv_fences_deref(obj, fences);
+	struct dma_fence_array *staged, *array;
+	unsigned int i;
+
+	array = fences->staged;
+	if (!array)
+		array = to_dma_fence_array(fence);
+
+	if (array)
+		num_fences += array->num_fences;
+	else if (fence)
+		num_fences += 1;
+
+	staged = fences->staged;
+	if (staged && dma_fence_array_max_fences(staged) >= num_fences)
+		return 0;
+
+	staged = dma_fence_array_alloc(num_fences, NULL);
+	if (!staged)
+		return -ENOMEM;
+
+	/* Copy over all fences from the old object */
+	if (array) {
+		for (i = 0; i < array->num_fences; ++i) {
+			struct dma_fence *f = array->fences[i];
+
+			staged->fences[i] = dma_fence_get(f);
+		}
+		staged->num_fences = array->num_fences;
+
+	} else if (fence) {
+		staged->fences[0] = dma_fence_get(fence);
+		staged->num_fences = 1;
+
+	} else {
+		staged->num_fences = 0;
+	}
+
+	dma_fence_array_free(fences->staged);
+	fences->staged = staged;
+
+	return 0;
+}
+EXPORT_SYMBOL(dma_resv_fences_reserve);
+
+/**
+ * dma_resv_fences_assign - set the singleton fence
+ * @fences: fences object where to set the fence
+ * @fence: singleton fence for the object
+ *
+ * Internal helper to assign the signleton fence without grapping a reference.
+ * If the old fence is a dma_fence_array try to recycle it.
+ */
+static void dma_resv_fences_assign(struct dma_resv *obj,
+				   struct dma_resv_fences *fences,
+				   struct dma_fence *fence)
+{
+	struct dma_fence_array *array, *staged;
+	unsigned int num_fences, i;
+	struct dma_fence *old;
+
+	old = dma_resv_fences_deref(obj, fences);
+	rcu_assign_pointer(fences->fence, fence);
+
+	dma_fence_array_free(fences->staged);
+	fences->staged = NULL;
+
+	/* Try to recycle the old fence array */
+	staged = to_dma_fence_array(old);
+	if (!staged)
+		goto drop_old;
+
+	array = to_dma_fence_array(fence);
+	if (array)
+		num_fences = array->num_fences;
+	else
+		num_fences = fence ? 1 : 0;
+
+	if (dma_fence_array_max_fences(staged) < num_fences)
+		goto drop_old;
+
+	/* Try to drop the last reference */
+	if (!dma_fence_array_recycle(staged))
+		return;
+
+	/* Make sure the staged array has the latest fences */
+	if (array) {
+		for (i = 0; i < array->num_fences; ++i) {
+			struct dma_fence *f = array->fences[i];
+
+			if (f == staged->fences[i])
+				continue;
+
+			dma_fence_put(staged->fences[i]);
+			staged->fences[i] = dma_fence_get(f);
+		}
+		for (;i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->num_fences = array->num_fences;
+
+	} else if (fence) {
+		for (i = 0; i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->fences[0] = dma_fence_get(fence);
+		staged->num_fences = 1;
+	} else {
+		for (i = 0; i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->num_fences = 0;
+	}
+
+	fences->staged = staged;
+	return;
+
+drop_old:
+	dma_fence_put(old);
+}
+
+/**
+ * dma_resv_fences_set - set the singleton fence
+ * @fences: fences object where to set the fence
+ * @fence: singleton fence for the object
+ *
+ * Grabs a reference to the new fence and replaces the current singleton fence
+ * with a new one. If the old fence is a dma_fence_array try to recycle it.
+ */
+void dma_resv_fences_set(struct dma_resv *obj,
+			 struct dma_resv_fences *fences,
+			 struct dma_fence *fence)
+{
+	dma_fence_get(fence);
+	dma_resv_fences_assign(obj, fences, fence);
+}
+EXPORT_SYMBOL(dma_resv_fences_set);
+
+/**
+ * dma_resv_fences_add - add a fence to the staged fence_array
+ * @fences: fences object where to add the fence to
+ * @fence: fence to add
+ *
+ * Add a new fence to the staged fence_array.
+ */
+void dma_resv_fences_add(struct dma_resv_fences *fences,
+			 struct dma_fence *fence)
+{
+	struct dma_fence_array *staged = fences->staged;
+	struct dma_fence *old;
+	unsigned int i;
+
+#ifndef CONFIG_DEBUG_MUTEXES
+	for (i = 0; i < staged->num_fences; ++i) {
+		old = staged->fences[i];
+
+		if (old->context == fence->context ||
+		    dma_fence_is_signaled(old)) {
+			dma_fence_put(old);
+			goto replace;
+		}
+	}
+#endif
+
+	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
+	i = staged->num_fences++;
+
+replace:
+	staged->fences[i] = dma_fence_get(fence);
+}
+EXPORT_SYMBOL(dma_resv_fences_add);
+
+/**
+ * dma_resv_fences_commit - commit the staged dma_fence_array
+ * @fences: fences object where the commit should happen
+ *
+ * Commit the fences staged in the dma_fence_array and make them visible to
+ * other threads.
+ */
+void dma_resv_fences_commit(struct dma_resv *obj,
+			    struct dma_resv_fences *fences)
+{
+	struct dma_fence_array *staged = fences->staged;
+
+	if (!staged || !staged->num_fences)
+		return;
+
+	fences->staged = NULL;
+	dma_fence_array_init(staged, dma_fence_context_alloc(1), 1, false);
+	dma_resv_fences_assign(obj, fences, &staged->base);
+}
+EXPORT_SYMBOL(dma_resv_fences_commit);
+
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 03b0f95682b0..c70f13fa6789 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -45,10 +45,33 @@
 #include <linux/seqlock.h>
 #include <linux/rcupdate.h>
 
+struct dma_resv;
+
 extern struct ww_class reservation_ww_class;
 extern struct lock_class_key reservation_seqcount_class;
 extern const char reservation_seqcount_string[];
 
+/**
+ * struct dma_resv_fences - fences inside a reservation object
+ * @fence: the current RCU protected singleton fence
+ * @staged: optional staged dma_fence_array to replace @fence
+ */
+struct dma_resv_fences {
+	struct dma_fence __rcu *fence;
+	struct dma_fence_array *staged;
+};
+
+int dma_resv_fences_reserve(struct dma_resv *obj,
+			    struct dma_resv_fences *fences,
+			    unsigned int num_fences);
+void dma_resv_fences_set(struct dma_resv *obj,
+			 struct dma_resv_fences *fences,
+			 struct dma_fence *fence);
+void dma_resv_fences_add(struct dma_resv_fences *fences,
+			 struct dma_fence *fence);
+void dma_resv_fences_commit(struct dma_resv *obj,
+			    struct dma_resv_fences *fences);
+
 /**
  * struct dma_resv_list - a list of shared fences
  * @rcu: for internal use
@@ -80,6 +103,32 @@ struct dma_resv {
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
 #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base)
 
+/**
+ * dma_resv_fences_deref - get singleton fence
+ * @obj: the reservation object
+ * @fences: the fences object
+ *
+ * Returns the singleton fence from a resv_fences object.
+ */
+static inline struct dma_fence *
+dma_resv_fences_deref(struct dma_resv *obj, struct dma_resv_fences *fences)
+{
+	return rcu_dereference_protected(fences->fence,
+					 dma_resv_held(obj));
+}
+
+/**
+ * dma_resv_fences_get_rcu - RCU get single fence
+ * @fences: fences structure where we need to get a reference for
+ *
+ * Get a reference to the single fence representing the synchronization.
+ */
+static inline struct dma_fence *
+dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
+{
+	return dma_fence_get_rcu_safe(&fences->fence);
+}
+
 /**
  * dma_resv_get_list - get the reservation object's
  * shared fence list, with update-side lock held
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/10] dma-buf/resv: add new fences container implementation
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Add a new container for fences which internally uses
dma_fence_array's to store the fences.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c | 221 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-resv.h   |  49 ++++++++
 2 files changed, 270 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index d3a9a3bb15f0..83033b3e8521 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -33,6 +33,7 @@
  */
 
 #include <linux/dma-resv.h>
+#include <linux/dma-fence-array.h>
 #include <linux/export.h>
 
 /**
@@ -55,6 +56,226 @@ EXPORT_SYMBOL(reservation_seqcount_class);
 const char reservation_seqcount_string[] = "reservation_seqcount";
 EXPORT_SYMBOL(reservation_seqcount_string);
 
+static void dma_resv_fences_init(struct dma_resv_fences *fences)
+{
+	RCU_INIT_POINTER(fences->fence, NULL);
+	fences->staged = NULL;
+}
+
+static void dma_resv_fences_fini(struct dma_resv_fences *fences)
+{
+	/*
+	 * This object should be dead and all references must have
+	 * been released to it, so no need to be protected with rcu.
+	 */
+	dma_fence_put(rcu_dereference_protected(fences->fence, true));
+	dma_fence_array_free(fences->staged);
+}
+
+/**
+ * dma_resv_fences_reserve - allocate fence slots
+ * @fences: fences object where we need slots
+ * @num_fences: number of fence slots we need
+ *
+ * Make sure that we have at least @num_fences + all the existing ones free
+ * slots in the staged dma_fence_array.
+ *
+ * Returns -ENOMEM on allocation failure, 0 otherwise.
+ */
+int dma_resv_fences_reserve(struct dma_resv *obj,
+			    struct dma_resv_fences *fences,
+			    unsigned int num_fences)
+{
+	struct dma_fence *fence = dma_resv_fences_deref(obj, fences);
+	struct dma_fence_array *staged, *array;
+	unsigned int i;
+
+	array = fences->staged;
+	if (!array)
+		array = to_dma_fence_array(fence);
+
+	if (array)
+		num_fences += array->num_fences;
+	else if (fence)
+		num_fences += 1;
+
+	staged = fences->staged;
+	if (staged && dma_fence_array_max_fences(staged) >= num_fences)
+		return 0;
+
+	staged = dma_fence_array_alloc(num_fences, NULL);
+	if (!staged)
+		return -ENOMEM;
+
+	/* Copy over all fences from the old object */
+	if (array) {
+		for (i = 0; i < array->num_fences; ++i) {
+			struct dma_fence *f = array->fences[i];
+
+			staged->fences[i] = dma_fence_get(f);
+		}
+		staged->num_fences = array->num_fences;
+
+	} else if (fence) {
+		staged->fences[0] = dma_fence_get(fence);
+		staged->num_fences = 1;
+
+	} else {
+		staged->num_fences = 0;
+	}
+
+	dma_fence_array_free(fences->staged);
+	fences->staged = staged;
+
+	return 0;
+}
+EXPORT_SYMBOL(dma_resv_fences_reserve);
+
+/**
+ * dma_resv_fences_assign - set the singleton fence
+ * @fences: fences object where to set the fence
+ * @fence: singleton fence for the object
+ *
+ * Internal helper to assign the signleton fence without grapping a reference.
+ * If the old fence is a dma_fence_array try to recycle it.
+ */
+static void dma_resv_fences_assign(struct dma_resv *obj,
+				   struct dma_resv_fences *fences,
+				   struct dma_fence *fence)
+{
+	struct dma_fence_array *array, *staged;
+	unsigned int num_fences, i;
+	struct dma_fence *old;
+
+	old = dma_resv_fences_deref(obj, fences);
+	rcu_assign_pointer(fences->fence, fence);
+
+	dma_fence_array_free(fences->staged);
+	fences->staged = NULL;
+
+	/* Try to recycle the old fence array */
+	staged = to_dma_fence_array(old);
+	if (!staged)
+		goto drop_old;
+
+	array = to_dma_fence_array(fence);
+	if (array)
+		num_fences = array->num_fences;
+	else
+		num_fences = fence ? 1 : 0;
+
+	if (dma_fence_array_max_fences(staged) < num_fences)
+		goto drop_old;
+
+	/* Try to drop the last reference */
+	if (!dma_fence_array_recycle(staged))
+		return;
+
+	/* Make sure the staged array has the latest fences */
+	if (array) {
+		for (i = 0; i < array->num_fences; ++i) {
+			struct dma_fence *f = array->fences[i];
+
+			if (f == staged->fences[i])
+				continue;
+
+			dma_fence_put(staged->fences[i]);
+			staged->fences[i] = dma_fence_get(f);
+		}
+		for (;i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->num_fences = array->num_fences;
+
+	} else if (fence) {
+		for (i = 0; i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->fences[0] = dma_fence_get(fence);
+		staged->num_fences = 1;
+	} else {
+		for (i = 0; i < staged->num_fences; ++i)
+			dma_fence_put(staged->fences[i]);
+		staged->num_fences = 0;
+	}
+
+	fences->staged = staged;
+	return;
+
+drop_old:
+	dma_fence_put(old);
+}
+
+/**
+ * dma_resv_fences_set - set the singleton fence
+ * @fences: fences object where to set the fence
+ * @fence: singleton fence for the object
+ *
+ * Grabs a reference to the new fence and replaces the current singleton fence
+ * with a new one. If the old fence is a dma_fence_array try to recycle it.
+ */
+void dma_resv_fences_set(struct dma_resv *obj,
+			 struct dma_resv_fences *fences,
+			 struct dma_fence *fence)
+{
+	dma_fence_get(fence);
+	dma_resv_fences_assign(obj, fences, fence);
+}
+EXPORT_SYMBOL(dma_resv_fences_set);
+
+/**
+ * dma_resv_fences_add - add a fence to the staged fence_array
+ * @fences: fences object where to add the fence to
+ * @fence: fence to add
+ *
+ * Add a new fence to the staged fence_array.
+ */
+void dma_resv_fences_add(struct dma_resv_fences *fences,
+			 struct dma_fence *fence)
+{
+	struct dma_fence_array *staged = fences->staged;
+	struct dma_fence *old;
+	unsigned int i;
+
+#ifndef CONFIG_DEBUG_MUTEXES
+	for (i = 0; i < staged->num_fences; ++i) {
+		old = staged->fences[i];
+
+		if (old->context == fence->context ||
+		    dma_fence_is_signaled(old)) {
+			dma_fence_put(old);
+			goto replace;
+		}
+	}
+#endif
+
+	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
+	i = staged->num_fences++;
+
+replace:
+	staged->fences[i] = dma_fence_get(fence);
+}
+EXPORT_SYMBOL(dma_resv_fences_add);
+
+/**
+ * dma_resv_fences_commit - commit the staged dma_fence_array
+ * @fences: fences object where the commit should happen
+ *
+ * Commit the fences staged in the dma_fence_array and make them visible to
+ * other threads.
+ */
+void dma_resv_fences_commit(struct dma_resv *obj,
+			    struct dma_resv_fences *fences)
+{
+	struct dma_fence_array *staged = fences->staged;
+
+	if (!staged || !staged->num_fences)
+		return;
+
+	fences->staged = NULL;
+	dma_fence_array_init(staged, dma_fence_context_alloc(1), 1, false);
+	dma_resv_fences_assign(obj, fences, &staged->base);
+}
+EXPORT_SYMBOL(dma_resv_fences_commit);
+
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 03b0f95682b0..c70f13fa6789 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -45,10 +45,33 @@
 #include <linux/seqlock.h>
 #include <linux/rcupdate.h>
 
+struct dma_resv;
+
 extern struct ww_class reservation_ww_class;
 extern struct lock_class_key reservation_seqcount_class;
 extern const char reservation_seqcount_string[];
 
+/**
+ * struct dma_resv_fences - fences inside a reservation object
+ * @fence: the current RCU protected singleton fence
+ * @staged: optional staged dma_fence_array to replace @fence
+ */
+struct dma_resv_fences {
+	struct dma_fence __rcu *fence;
+	struct dma_fence_array *staged;
+};
+
+int dma_resv_fences_reserve(struct dma_resv *obj,
+			    struct dma_resv_fences *fences,
+			    unsigned int num_fences);
+void dma_resv_fences_set(struct dma_resv *obj,
+			 struct dma_resv_fences *fences,
+			 struct dma_fence *fence);
+void dma_resv_fences_add(struct dma_resv_fences *fences,
+			 struct dma_fence *fence);
+void dma_resv_fences_commit(struct dma_resv *obj,
+			    struct dma_resv_fences *fences);
+
 /**
  * struct dma_resv_list - a list of shared fences
  * @rcu: for internal use
@@ -80,6 +103,32 @@ struct dma_resv {
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
 #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base)
 
+/**
+ * dma_resv_fences_deref - get singleton fence
+ * @obj: the reservation object
+ * @fences: the fences object
+ *
+ * Returns the singleton fence from a resv_fences object.
+ */
+static inline struct dma_fence *
+dma_resv_fences_deref(struct dma_resv *obj, struct dma_resv_fences *fences)
+{
+	return rcu_dereference_protected(fences->fence,
+					 dma_resv_held(obj));
+}
+
+/**
+ * dma_resv_fences_get_rcu - RCU get single fence
+ * @fences: fences structure where we need to get a reference for
+ *
+ * Get a reference to the single fence representing the synchronization.
+ */
+static inline struct dma_fence *
+dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
+{
+	return dma_fence_get_rcu_safe(&fences->fence);
+}
+
 /**
  * dma_resv_get_list - get the reservation object's
  * shared fence list, with update-side lock held
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Replace the old shared fences with the new readers container
and change all callers accordingly.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c                     |  99 +---
 drivers/dma-buf/dma-resv.c                    | 524 +++---------------
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  58 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |  37 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c      |  17 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  19 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   3 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  18 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_busy.c      |  20 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c    |   2 +-
 drivers/gpu/drm/i915/i915_vma.c               |   4 +-
 drivers/gpu/drm/lima/lima_gem.c               |   4 +-
 drivers/gpu/drm/msm/msm_gem.c                 |  14 +-
 drivers/gpu/drm/msm/msm_gem_submit.c          |   4 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c       |  19 +-
 drivers/gpu/drm/qxl/qxl_debugfs.c             |  11 +-
 drivers/gpu/drm/qxl/qxl_release.c             |   5 +-
 drivers/gpu/drm/radeon/radeon_object.c        |   2 +-
 drivers/gpu/drm/radeon/radeon_sync.c          |  14 +-
 drivers/gpu/drm/radeon/radeon_vm.c            |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  22 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c        |  12 +-
 drivers/gpu/drm/vc4/vc4_gem.c                 |   4 +-
 drivers/gpu/drm/vgem/vgem_fence.c             |   4 +-
 include/linux/dma-resv.h                      |  42 +-
 28 files changed, 224 insertions(+), 747 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 433d91d710e4..220cfa9ca82b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -15,6 +15,7 @@
 #include <linux/slab.h>
 #include <linux/dma-buf.h>
 #include <linux/dma-fence.h>
+#include <linux/dma-fence-array.h>
 #include <linux/anon_inodes.h>
 #include <linux/export.h>
 #include <linux/debugfs.h>
@@ -194,12 +195,10 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 
 static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 {
+	struct dma_fence *fence_excl, *readers;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence_excl;
 	__poll_t events;
-	unsigned shared_count, seq;
 
 	dmabuf = file->private_data;
 	if (!dmabuf || !dmabuf->resv)
@@ -213,26 +212,15 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 	if (!events)
 		return 0;
 
-retry:
-	seq = read_seqcount_begin(&resv->seq);
 	rcu_read_lock();
+	fence_excl = dma_fence_get_rcu_safe(&resv->fence_excl);
+	readers = dma_resv_fences_get_rcu(&resv->readers);
 
-	fobj = rcu_dereference(resv->fence);
-	if (fobj)
-		shared_count = fobj->shared_count;
-	else
-		shared_count = 0;
-	fence_excl = rcu_dereference(resv->fence_excl);
-	if (read_seqcount_retry(&resv->seq, seq)) {
-		rcu_read_unlock();
-		goto retry;
-	}
-
-	if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
+	if (fence_excl && (!(events & EPOLLOUT) || !readers)) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
 		__poll_t pevents = EPOLLIN;
 
-		if (shared_count == 0)
+		if (!readers)
 			pevents |= EPOLLOUT;
 
 		spin_lock_irq(&dmabuf->poll.lock);
@@ -244,12 +232,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		spin_unlock_irq(&dmabuf->poll.lock);
 
 		if (events & pevents) {
-			if (!dma_fence_get_rcu(fence_excl)) {
-				/* force a recheck */
-				events &= ~pevents;
-				dma_buf_poll_cb(NULL, &dcb->cb);
-			} else if (!dma_fence_add_callback(fence_excl, &dcb->cb,
-							   dma_buf_poll_cb)) {
+			if (!dma_fence_add_callback(fence_excl, &dcb->cb,
+						    dma_buf_poll_cb)) {
 				events &= ~pevents;
 				dma_fence_put(fence_excl);
 			} else {
@@ -263,9 +247,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		}
 	}
 
-	if ((events & EPOLLOUT) && shared_count > 0) {
+	if ((events & EPOLLOUT) && readers) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
-		int i;
 
 		/* Only queue a new callback if no event has fired yet */
 		spin_lock_irq(&dmabuf->poll.lock);
@@ -278,32 +261,15 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		if (!(events & EPOLLOUT))
 			goto out;
 
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
-			if (!dma_fence_get_rcu(fence)) {
-				/*
-				 * fence refcount dropped to zero, this means
-				 * that fobj has been freed
-				 *
-				 * call dma_buf_poll_cb and force a recheck!
-				 */
-				events &= ~EPOLLOUT;
-				dma_buf_poll_cb(NULL, &dcb->cb);
-				break;
-			}
-			if (!dma_fence_add_callback(fence, &dcb->cb,
-						    dma_buf_poll_cb)) {
-				dma_fence_put(fence);
-				events &= ~EPOLLOUT;
-				break;
-			}
-			dma_fence_put(fence);
-		}
-
-		/* No callback queued, wake up any additional waiters. */
-		if (i == shared_count)
+		if (!dma_fence_add_callback(readers, &dcb->cb,
+					    dma_buf_poll_cb)) {
+			dma_fence_put(readers);
+			events &= ~EPOLLOUT;
+		} else {
+			/* No callback queued, wake up any additional waiters. */
+			dma_fence_put(readers);
 			dma_buf_poll_cb(NULL, &dcb->cb);
+		}
 	}
 
 out:
@@ -1154,11 +1120,10 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 	int ret;
 	struct dma_buf *buf_obj;
 	struct dma_buf_attachment *attach_obj;
+	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
-	unsigned seq;
-	int count = 0, attach_count, shared_count, i;
+	struct dma_fence *fence, *readers;
+	int count = 0, attach_count;
 	size_t size = 0;
 
 	ret = mutex_lock_interruptible(&db_list.lock);
@@ -1188,32 +1153,20 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 				buf_obj->name ?: "");
 
 		robj = buf_obj->resv;
-		while (true) {
-			seq = read_seqcount_begin(&robj->seq);
-			rcu_read_lock();
-			fobj = rcu_dereference(robj->fence);
-			shared_count = fobj ? fobj->shared_count : 0;
-			fence = rcu_dereference(robj->fence_excl);
-			if (!read_seqcount_retry(&robj->seq, seq))
-				break;
-			rcu_read_unlock();
-		}
-
+		rcu_read_lock();
+		fence = dma_resv_get_excl(robj);
+		readers = dma_resv_fences_get_rcu(&robj->readers);
 		if (fence)
 			seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
-		for (i = 0; i < shared_count; i++) {
-			fence = rcu_dereference(fobj->shared[i]);
-			if (!dma_fence_get_rcu(fence))
-				continue;
-			seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
+		dma_fence_array_for_each(fence, cursor, readers)
+			seq_printf(s, "\tReaders fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
-			dma_fence_put(fence);
-		}
+		dma_fence_put(readers);
 		rcu_read_unlock();
 
 		seq_puts(s, "\tAttached Devices:\n");
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 83033b3e8521..8723af0a7a4d 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -39,12 +39,11 @@
 /**
  * DOC: Reservation Object Overview
  *
- * The reservation object provides a mechanism to manage shared and
+ * The reservation object provides a mechanism to manage readers and
  * exclusive fences associated with a buffer.  A reservation object
  * can have attached one exclusive fence (normally associated with
- * write operations) or N shared fences (read operations).  The RCU
- * mechanism is used to protect read access to fences from locked
- * write-side updates.
+ * write operations) or N read fences.  The RCU mechanism is used to
+ * protect read access to fences from locked write-side updates.
  */
 
 DEFINE_WD_CLASS(reservation_ww_class);
@@ -235,17 +234,18 @@ void dma_resv_fences_add(struct dma_resv_fences *fences,
 	struct dma_fence *old;
 	unsigned int i;
 
-#ifndef CONFIG_DEBUG_MUTEXES
 	for (i = 0; i < staged->num_fences; ++i) {
 		old = staged->fences[i];
 
-		if (old->context == fence->context ||
-		    dma_fence_is_signaled(old)) {
+		if (old->context == fence->context
+#ifndef CONFIG_DEBUG_MUTEXES
+		    || dma_fence_is_signaled(old)
+#endif
+		    ) {
 			dma_fence_put(old);
 			goto replace;
 		}
 	}
-#endif
 
 	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
 	i = staged->num_fences++;
@@ -276,46 +276,6 @@ void dma_resv_fences_commit(struct dma_resv *obj,
 }
 EXPORT_SYMBOL(dma_resv_fences_commit);
 
-/**
- * dma_resv_list_alloc - allocate fence list
- * @shared_max: number of fences we need space for
- *
- * Allocate a new dma_resv_list and make sure to correctly initialize
- * shared_max.
- */
-static struct dma_resv_list *dma_resv_list_alloc(unsigned int shared_max)
-{
-	struct dma_resv_list *list;
-
-	list = kmalloc(offsetof(typeof(*list), shared[shared_max]), GFP_KERNEL);
-	if (!list)
-		return NULL;
-
-	list->shared_max = (ksize(list) - offsetof(typeof(*list), shared)) /
-		sizeof(*list->shared);
-
-	return list;
-}
-
-/**
- * dma_resv_list_free - free fence list
- * @list: list to free
- *
- * Free a dma_resv_list and make sure to drop all references.
- */
-static void dma_resv_list_free(struct dma_resv_list *list)
-{
-	unsigned int i;
-
-	if (!list)
-		return;
-
-	for (i = 0; i < list->shared_count; ++i)
-		dma_fence_put(rcu_dereference_protected(list->shared[i], true));
-
-	kfree_rcu(list, rcu);
-}
-
 /**
  * dma_resv_init - initialize a reservation object
  * @obj: the reservation object
@@ -326,8 +286,8 @@ void dma_resv_init(struct dma_resv *obj)
 
 	__seqcount_init(&obj->seq, reservation_seqcount_string,
 			&reservation_seqcount_class);
-	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
+	dma_resv_fences_init(&obj->readers);
 }
 EXPORT_SYMBOL(dma_resv_init);
 
@@ -337,151 +297,16 @@ EXPORT_SYMBOL(dma_resv_init);
  */
 void dma_resv_fini(struct dma_resv *obj)
 {
-	struct dma_resv_list *fobj;
-	struct dma_fence *excl;
-
 	/*
 	 * This object should be dead and all references must have
 	 * been released to it, so no need to be protected with rcu.
 	 */
-	excl = rcu_dereference_protected(obj->fence_excl, 1);
-	if (excl)
-		dma_fence_put(excl);
-
-	fobj = rcu_dereference_protected(obj->fence, 1);
-	dma_resv_list_free(fobj);
+	dma_fence_put(rcu_dereference_protected(obj->fence_excl, true));
+	dma_resv_fences_fini(&obj->readers);
 	ww_mutex_destroy(&obj->lock);
 }
 EXPORT_SYMBOL(dma_resv_fini);
 
-/**
- * dma_resv_reserve_shared - Reserve space to add shared fences to
- * a dma_resv.
- * @obj: reservation object
- * @num_fences: number of fences we want to add
- *
- * Should be called before dma_resv_add_shared_fence().  Must
- * be called with obj->lock held.
- *
- * RETURNS
- * Zero for success, or -errno
- */
-int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences)
-{
-	struct dma_resv_list *old, *new;
-	unsigned int i, j, k, max;
-
-	dma_resv_assert_held(obj);
-
-	old = dma_resv_get_list(obj);
-
-	if (old && old->shared_max) {
-		if ((old->shared_count + num_fences) <= old->shared_max)
-			return 0;
-		else
-			max = max(old->shared_count + num_fences,
-				  old->shared_max * 2);
-	} else {
-		max = 4;
-	}
-
-	new = dma_resv_list_alloc(max);
-	if (!new)
-		return -ENOMEM;
-
-	/*
-	 * no need to bump fence refcounts, rcu_read access
-	 * requires the use of kref_get_unless_zero, and the
-	 * references from the old struct are carried over to
-	 * the new.
-	 */
-	for (i = 0, j = 0, k = max; i < (old ? old->shared_count : 0); ++i) {
-		struct dma_fence *fence;
-
-		fence = rcu_dereference_protected(old->shared[i],
-						  dma_resv_held(obj));
-		if (dma_fence_is_signaled(fence))
-			RCU_INIT_POINTER(new->shared[--k], fence);
-		else
-			RCU_INIT_POINTER(new->shared[j++], fence);
-	}
-	new->shared_count = j;
-
-	/*
-	 * We are not changing the effective set of fences here so can
-	 * merely update the pointer to the new array; both existing
-	 * readers and new readers will see exactly the same set of
-	 * active (unsignaled) shared fences. Individual fences and the
-	 * old array are protected by RCU and so will not vanish under
-	 * the gaze of the rcu_read_lock() readers.
-	 */
-	rcu_assign_pointer(obj->fence, new);
-
-	if (!old)
-		return 0;
-
-	/* Drop the references to the signaled fences */
-	for (i = k; i < max; ++i) {
-		struct dma_fence *fence;
-
-		fence = rcu_dereference_protected(new->shared[i],
-						  dma_resv_held(obj));
-		dma_fence_put(fence);
-	}
-	kfree_rcu(old, rcu);
-
-	return 0;
-}
-EXPORT_SYMBOL(dma_resv_reserve_shared);
-
-/**
- * dma_resv_add_shared_fence - Add a fence to a shared slot
- * @obj: the reservation object
- * @fence: the shared fence to add
- *
- * Add a fence to a shared slot, obj->lock must be held, and
- * dma_resv_reserve_shared() has been called.
- */
-void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
-{
-	struct dma_resv_list *fobj;
-	struct dma_fence *old;
-	unsigned int i, count;
-
-	dma_fence_get(fence);
-
-	dma_resv_assert_held(obj);
-
-	fobj = dma_resv_get_list(obj);
-	count = fobj->shared_count;
-
-	preempt_disable();
-	write_seqcount_begin(&obj->seq);
-
-	for (i = 0; i < count; ++i) {
-
-		old = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(obj));
-		if (old->context == fence->context ||
-		    dma_fence_is_signaled(old))
-			goto replace;
-	}
-
-	BUG_ON(fobj->shared_count >= fobj->shared_max);
-	old = NULL;
-	count++;
-
-replace:
-	RCU_INIT_POINTER(fobj->shared[i], fence);
-	/* pointer update must be visible before we extend the shared_count */
-	smp_store_mb(fobj->shared_count, count);
-
-	write_seqcount_end(&obj->seq);
-	preempt_enable();
-	dma_fence_put(old);
-}
-EXPORT_SYMBOL(dma_resv_add_shared_fence);
-
 /**
  * dma_resv_add_excl_fence - Add an exclusive fence.
  * @obj: the reservation object
@@ -501,6 +326,7 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
+	dma_resv_fences_set(obj, &obj->readers, fence);
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
@@ -516,9 +342,7 @@ EXPORT_SYMBOL(dma_resv_add_excl_fence);
  */
 void dma_resv_prune_fences(struct dma_resv *obj)
 {
-	struct dma_resv_list *list;
 	struct dma_fence *fence;
-	unsigned int i;
 
 	dma_resv_assert_held(obj);
 
@@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 		dma_fence_put(fence);
 	}
 
-	list = dma_resv_get_list(obj);
-	if (!list)
-		return;
-
-	for (i = 0; i < list->shared_count; ++i) {
-		fence = rcu_dereference_protected(list->shared[i],
-						  dma_resv_held(obj));
-
-		if (!dma_fence_is_signaled(fence))
-			continue;
-
-		RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
-		dma_fence_put(fence);
-	}
+	fence = dma_resv_fences_deref(obj, &obj->readers);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->readers, NULL);
 }
 EXPORT_SYMBOL(dma_resv_prune_fences);
 
@@ -554,73 +367,25 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_resv_list *src_list, *dst_list;
-	struct dma_fence *old, *new;
-	unsigned i;
+	struct dma_fence *old, *excl, *readers;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
-	src_list = rcu_dereference(src->fence);
-
-retry:
-	if (src_list) {
-		unsigned shared_count = src_list->shared_count;
-
-		rcu_read_unlock();
-
-		dst_list = dma_resv_list_alloc(shared_count);
-		if (!dst_list)
-			return -ENOMEM;
-
-		rcu_read_lock();
-		src_list = rcu_dereference(src->fence);
-		if (!src_list || src_list->shared_count > shared_count) {
-			kfree(dst_list);
-			goto retry;
-		}
-
-		dst_list->shared_count = 0;
-		for (i = 0; i < src_list->shared_count; ++i) {
-			struct dma_fence *fence;
-
-			fence = rcu_dereference(src_list->shared[i]);
-			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				     &fence->flags))
-				continue;
-
-			if (!dma_fence_get_rcu(fence)) {
-				dma_resv_list_free(dst_list);
-				src_list = rcu_dereference(src->fence);
-				goto retry;
-			}
-
-			if (dma_fence_is_signaled(fence)) {
-				dma_fence_put(fence);
-				continue;
-			}
-
-			rcu_assign_pointer(dst_list->shared[dst_list->shared_count++], fence);
-		}
-	} else {
-		dst_list = NULL;
-	}
-
-	new = dma_fence_get_rcu_safe(&src->fence_excl);
+	excl = dma_fence_get_rcu_safe(&src->fence_excl);
+	readers = dma_resv_fences_get_rcu(&src->readers);
 	rcu_read_unlock();
 
-	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
 	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(dst->fence_excl, new);
-	RCU_INIT_POINTER(dst->fence, dst_list);
+	RCU_INIT_POINTER(dst->fence_excl, excl);
+	dma_resv_fences_set(dst, &dst->readers, readers);
 	write_seqcount_end(&dst->seq);
 	preempt_enable();
 
-	dma_resv_list_free(src_list);
 	dma_fence_put(old);
 
 	return 0;
@@ -628,7 +393,7 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 EXPORT_SYMBOL(dma_resv_copy_fences);
 
 /**
- * dma_resv_get_fences_rcu - Get an object's shared and exclusive
+ * dma_resv_get_fences_rcu - Get an object's readers and exclusive
  * fences without update side lock held
  * @obj: the reservation object
  * @pfence_excl: the returned exclusive fence (or NULL)
@@ -645,92 +410,51 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    unsigned *pshared_count,
 			    struct dma_fence ***pshared)
 {
-	struct dma_fence **shared = NULL;
-	struct dma_fence *fence_excl;
+	struct dma_fence *excl, *readers;
+	struct dma_fence **shared;
 	unsigned int shared_count;
-	int ret = 1;
-
-	do {
-		struct dma_resv_list *fobj;
-		unsigned int i, seq;
-		size_t sz = 0;
-
-		shared_count = i = 0;
-
-		rcu_read_lock();
-		seq = read_seqcount_begin(&obj->seq);
-
-		fence_excl = rcu_dereference(obj->fence_excl);
-		if (fence_excl && !dma_fence_get_rcu(fence_excl))
-			goto unlock;
-
-		fobj = rcu_dereference(obj->fence);
-		if (fobj)
-			sz += sizeof(*shared) * fobj->shared_max;
-
-		if (!pfence_excl && fence_excl)
-			sz += sizeof(*shared);
-
-		if (sz) {
-			struct dma_fence **nshared;
-
-			nshared = krealloc(shared, sz,
-					   GFP_NOWAIT | __GFP_NOWARN);
-			if (!nshared) {
-				rcu_read_unlock();
-
-				dma_fence_put(fence_excl);
-				fence_excl = NULL;
-
-				nshared = krealloc(shared, sz, GFP_KERNEL);
-				if (nshared) {
-					shared = nshared;
-					continue;
-				}
-
-				ret = -ENOMEM;
-				break;
-			}
-			shared = nshared;
-			shared_count = fobj ? fobj->shared_count : 0;
-			for (i = 0; i < shared_count; ++i) {
-				shared[i] = rcu_dereference(fobj->shared[i]);
-				if (!dma_fence_get_rcu(shared[i]))
-					break;
-			}
-		}
 
-		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
-			while (i--)
-				dma_fence_put(shared[i]);
-			dma_fence_put(fence_excl);
-			goto unlock;
-		}
+	rcu_read_lock();
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
+	rcu_read_unlock();
 
-		ret = 0;
-unlock:
-		rcu_read_unlock();
-	} while (ret);
+	shared_count = readers ? 1 : 0;
+	if (excl && !pfence_excl)
+		++shared_count;
 
-	if (pfence_excl)
-		*pfence_excl = fence_excl;
-	else if (fence_excl)
-		shared[++shared_count] = fence_excl;
+	if (shared_count) {
+		shared = kmalloc_array(shared_count, sizeof(*shared),
+				       GFP_KERNEL);
+		if (!shared) {
+			dma_fence_put(excl);
+			dma_fence_put(readers);
+			return -ENOMEM;
+		}
 
-	if (!shared_count) {
-		kfree(shared);
-		shared = NULL;
+		shared_count = 0;
+		if (readers)
+			shared[shared_count++] = readers;
+		if (excl && !pfence_excl)
+			shared[shared_count++] = excl;
+
+		*pshared = shared;
+		*pshared_count = shared_count;
+	} else {
+		*pshared = NULL;
+		*pshared_count = 0;
 	}
 
-	*pshared_count = shared_count;
-	*pshared = shared;
-	return ret;
+	if (pfence_excl)
+		*pfence_excl = excl;
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(dma_resv_get_fences_rcu);
 
 /**
  * dma_resv_wait_timeout_rcu - Wait on reservation's objects
- * shared and/or exclusive fences.
+ * readers and/or exclusive fences.
  * @obj: the reservation object
  * @wait_all: if true, wait on all fences, else wait on just exclusive fence
  * @intr: if true, do interruptible wait
@@ -744,93 +468,29 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *fence;
-	unsigned seq, shared_count;
+	struct dma_fence *excl, *readers;
 	long ret = timeout ? timeout : 1;
-	int i;
 
-retry:
-	shared_count = 0;
-	seq = read_seqcount_begin(&obj->seq);
 	rcu_read_lock();
-	i = -1;
-
-	fence = rcu_dereference(obj->fence_excl);
-	if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
-		if (!dma_fence_get_rcu(fence))
-			goto unlock_retry;
-
-		if (dma_fence_is_signaled(fence)) {
-			dma_fence_put(fence);
-			fence = NULL;
-		}
-
-	} else {
-		fence = NULL;
-	}
-
-	if (wait_all) {
-		struct dma_resv_list *fobj = rcu_dereference(obj->fence);
-
-		if (fobj)
-			shared_count = fobj->shared_count;
-
-		for (i = 0; !fence && i < shared_count; ++i) {
-			struct dma_fence *lfence = rcu_dereference(fobj->shared[i]);
-
-			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				     &lfence->flags))
-				continue;
-
-			if (!dma_fence_get_rcu(lfence))
-				goto unlock_retry;
-
-			if (dma_fence_is_signaled(lfence)) {
-				dma_fence_put(lfence);
-				continue;
-			}
-
-			fence = lfence;
-			break;
-		}
-	}
-
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
-	if (fence) {
-		if (read_seqcount_retry(&obj->seq, seq)) {
-			dma_fence_put(fence);
-			goto retry;
-		}
 
-		ret = dma_fence_wait_timeout(fence, intr, ret);
-		dma_fence_put(fence);
-		if (ret > 0 && wait_all && (i + 1 < shared_count))
-			goto retry;
+	if (wait_all && readers) {
+		ret = dma_fence_wait_timeout(readers, intr, ret);
+		if (ret <= 0)
+			goto out;
 	}
-	return ret;
-
-unlock_retry:
-	rcu_read_unlock();
-	goto retry;
-}
-EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
-
-
-static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
-{
-	struct dma_fence *fence, *lfence = passed_fence;
-	int ret = 1;
 
-	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
-		fence = dma_fence_get_rcu(lfence);
-		if (!fence)
-			return -1;
+	if (excl)
+		ret = dma_fence_wait_timeout(excl, intr, ret);
 
-		ret = !!dma_fence_is_signaled(fence);
-		dma_fence_put(fence);
-	}
+out:
+	dma_fence_put(excl);
+	dma_fence_put(readers);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
 
 /**
  * dma_resv_test_signaled_rcu - Test if a reservation object's
@@ -844,51 +504,23 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	unsigned seq, shared_count;
-	int ret;
+	struct dma_fence *excl, *readers;
+	bool ret = true;
 
 	rcu_read_lock();
-retry:
-	ret = true;
-	shared_count = 0;
-	seq = read_seqcount_begin(&obj->seq);
-
-	if (test_all) {
-		unsigned i;
-
-		struct dma_resv_list *fobj = rcu_dereference(obj->fence);
-
-		if (fobj)
-			shared_count = fobj->shared_count;
-
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
-			ret = dma_resv_test_signaled_single(fence);
-			if (ret < 0)
-				goto retry;
-			else if (!ret)
-				break;
-		}
-
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto retry;
-	}
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
+	rcu_read_unlock();
 
-	if (!shared_count) {
-		struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
+	if (excl)
+		ret = dma_fence_is_signaled(excl);
 
-		if (fence_excl) {
-			ret = dma_resv_test_signaled_single(fence_excl);
-			if (ret < 0)
-				goto retry;
+	if (test_all && readers)
+		ret &= dma_fence_is_signaled(readers);
 
-			if (read_seqcount_retry(&obj->seq, seq))
-				goto retry;
-		}
-	}
+	dma_fence_put(excl);
+	dma_fence_put(readers);
 
-	rcu_read_unlock();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_test_signaled_rcu);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 76e3516484e7..ec4f29934b09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -23,6 +23,7 @@
 #define pr_fmt(fmt) "kfd2kgd: " fmt
 
 #include <linux/dma-buf.h>
+#include <linux/dma-fence-array.h>
 #include <linux/list.h>
 #include <linux/pagemap.h>
 #include <linux/sched/mm.h>
@@ -219,54 +220,27 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 					struct amdgpu_amdkfd_fence *ef)
 {
 	struct dma_resv *resv = bo->tbo.base.resv;
-	struct dma_resv_list *old, *new;
-	unsigned int i, j, k;
+	unsigned int i;
+	int r;
 
 	if (!ef)
 		return -EINVAL;
 
-	old = dma_resv_get_list(resv);
-	if (!old)
-		return 0;
-
-	new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
-		      GFP_KERNEL);
-	if (!new)
-		return -ENOMEM;
+	r = dma_resv_fences_reserve(resv, &resv->readers, 0);
+	if (r)
+		return r;
 
-	/* Go through all the shared fences in the resevation object and sort
-	 * the interesting ones to the end of the list.
-	 */
-	for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
+	for (i = 0; i < resv->readers.staged->num_fences; ++i) {
 		struct dma_fence *f;
 
-		f = rcu_dereference_protected(old->shared[i],
-					      dma_resv_held(resv));
-
-		if (f->context == ef->base.context)
-			RCU_INIT_POINTER(new->shared[--j], f);
-		else
-			RCU_INIT_POINTER(new->shared[k++], f);
+		f = resv->readers.staged->fences[i];
+		if (f->context == ef->base.context) {
+			resv->readers.staged->fences[i] = dma_fence_get_stub();
+			dma_fence_put(f);
+		}
 	}
-	new->shared_max = old->shared_max;
-	new->shared_count = k;
-
-	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
-	write_seqcount_begin(&resv->seq);
-	RCU_INIT_POINTER(resv->fence, new);
-	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
-	/* Drop the references to the removed fences or move them to ef_list */
-	for (i = j, k = 0; i < old->shared_count; ++i) {
-		struct dma_fence *f;
-
-		f = rcu_dereference_protected(new->shared[i],
-					      dma_resv_held(resv));
-		dma_fence_put(f);
-	}
-	kfree_rcu(old, rcu);
+	dma_resv_fences_commit(resv, &resv->readers);
 
 	return 0;
 }
@@ -840,6 +814,7 @@ static int process_update_pds(struct amdkfd_process_info *process_info,
 static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 		       struct dma_fence **ef)
 {
+	struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;
 	struct amdkfd_process_info *info = NULL;
 	int ret;
 
@@ -887,7 +862,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 				  AMDGPU_FENCE_OWNER_KFD, false);
 	if (ret)
 		goto wait_pd_fail;
-	ret = dma_resv_reserve_shared(vm->root.base.bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 	if (ret)
 		goto reserve_shared_fail;
 	amdgpu_bo_fence(vm->root.base.bo,
@@ -2132,7 +2107,8 @@ int amdgpu_amdkfd_add_gws_to_process(void *info, void *gws, struct kgd_mem **mem
 	 * Add process eviction fence to bo so they can
 	 * evict each other.
 	 */
-	ret = dma_resv_reserve_shared(gws_bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(gws_bo->tbo.base.resv,
+				      &gws_bo->tbo.base.resv->readers, 1);
 	if (ret)
 		goto reserve_shared_fail;
 	amdgpu_bo_fence(gws_bo, &process_info->eviction_fence->base, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index bf0f00508987..3e4685565f82 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -139,43 +139,12 @@ int amdgpu_gem_prime_mmap(struct drm_gem_object *obj,
 static int
 __dma_resv_make_exclusive(struct dma_resv *obj)
 {
-	struct dma_fence **fences;
-	unsigned int count;
-	int r;
+	struct dma_fence *readers = dma_fence_get_rcu(obj->readers.fence);
 
-	if (!dma_resv_get_list(obj)) /* no shared fences to convert */
-		return 0;
-
-	r = dma_resv_get_fences_rcu(obj, NULL, &count, &fences);
-	if (r)
-		return r;
-
-	if (count == 0) {
-		/* Now that was unexpected. */
-	} else if (count == 1) {
-		dma_resv_add_excl_fence(obj, fences[0]);
-		dma_fence_put(fences[0]);
-		kfree(fences);
-	} else {
-		struct dma_fence_array *array;
-
-		array = dma_fence_array_create(count, fences,
-					       dma_fence_context_alloc(1), 0,
-					       false);
-		if (!array)
-			goto err_fences_put;
-
-		dma_resv_add_excl_fence(obj, &array->base);
-		dma_fence_put(&array->base);
-	}
+	dma_resv_add_excl_fence(obj, readers);
+	dma_fence_put(readers);
 
 	return 0;
-
-err_fences_put:
-	while (count--)
-		dma_fence_put(fences[count]);
-	kfree(fences);
-	return -ENOMEM;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2f11ebd95528..a2dddd8d897c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1286,7 +1286,7 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
 	struct dma_resv *resv = bo->tbo.base.resv;
 
 	if (shared)
-		dma_resv_add_shared_fence(resv, fence);
+		dma_resv_fences_add(&resv->readers, fence);
 	else
 		dma_resv_add_excl_fence(resv, fence);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 95e5e93edd18..f8bc5baf0a44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -28,6 +28,8 @@
  *    Christian König <christian.koenig@amd.com>
  */
 
+#include <linux/dma-fence-array.h>
+
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -193,10 +195,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 		     struct dma_resv *resv,
 		     void *owner, bool explicit_sync)
 {
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *array;
 	void *fence_owner;
-	unsigned i;
 	int r = 0;
 
 	if (resv == NULL)
@@ -205,14 +206,12 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 	/* always sync to the exclusive fence */
 	f = dma_resv_get_excl(resv);
 	r = amdgpu_sync_fence(adev, sync, f, false);
-
-	flist = dma_resv_get_list(resv);
-	if (!flist || r)
+	if (r)
 		return r;
 
-	for (i = 0; i < flist->shared_count; ++i) {
-		f = rcu_dereference_protected(flist->shared[i],
-					      dma_resv_held(resv));
+	array = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(f, cursor, array) {
+
 		/* We only want to trigger KFD eviction fences on
 		 * evict or move jobs. Skip KFD fences otherwise.
 		 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index fb09314bcfd4..3c2efd14f945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -39,6 +39,7 @@
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/swiotlb.h>
+#include <linux/dma-fence-array.h>
 
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
@@ -1470,9 +1471,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	unsigned long num_pages = bo->mem.num_pages;
 	struct drm_mm_node *node = bo->mem.mm_node;
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
-	int i;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *array;
 
 	/* Don't evict VM page tables while they are busy, otherwise we can't
 	 * cleanly handle page faults.
@@ -1485,15 +1485,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	 * If true, then return false as any KFD process needs all its BOs to
 	 * be resident to run successfully
 	 */
-	flist = dma_resv_get_list(bo->base.resv);
-	if (flist) {
-		for (i = 0; i < flist->shared_count; ++i) {
-			f = rcu_dereference_protected(flist->shared[i],
-				dma_resv_held(bo->base.resv));
-			if (amdkfd_fence_check_mm(f, current->mm))
-				return false;
-		}
-	}
+	array = dma_resv_fences_deref(bo->base.resv, &bo->base.resv->readers);
+	dma_fence_array_for_each(f, cursor, array)
+		if (amdkfd_fence_check_mm(f, current->mm))
+			return false;
 
 	switch (bo->mem.mem_type) {
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f555843daef0..c269d4c169c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2724,7 +2724,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		goto error_free_root;
 
-	r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
+	r = dma_resv_fences_reserve(root->tbo.base.resv,
+				    &root->tbo.base.resv->readers, 1);
 	if (r)
 		goto error_unreserve;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 7d83e04ec36e..15f36ccd6e53 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -7,6 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/task.h>
+#include <linux/dma-fence-array.h>
 
 #include "etnaviv_drv.h"
 #include "etnaviv_gem.h"
@@ -459,9 +460,9 @@ static void etnaviv_gem_describe_fence(struct dma_fence *fence,
 static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 {
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj = obj->resv;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
+	struct dma_fence *fence, *readers;
 	unsigned long off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08lx %p %zd\n",
@@ -470,16 +471,9 @@ static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 			off, etnaviv_obj->vaddr, obj->size);
 
 	rcu_read_lock();
-	fobj = rcu_dereference(robj->fence);
-	if (fobj) {
-		unsigned int i, shared_count = fobj->shared_count;
-
-		for (i = 0; i < shared_count; i++) {
-			fence = rcu_dereference(fobj->shared[i]);
-			etnaviv_gem_describe_fence(fence, "Shared", m);
-		}
-	}
-
+	readers = dma_resv_fences_deref(robj, &robj->readers);
+	dma_fence_array_for_each(fence, cursor, readers)
+		etnaviv_gem_describe_fence(fence, "Shared", m);
 	fence = rcu_dereference(robj->fence_excl);
 	if (fence)
 		etnaviv_gem_describe_fence(fence, "Exclusive", m);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 998c96b40d8a..5faab8f0e67f 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -168,7 +168,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
 		struct dma_resv *robj = bo->obj->base.resv;
 
 		if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
-			ret = dma_resv_reserve_shared(robj, 1);
+			ret = dma_resv_fences_reserve(robj, &robj->readers, 1);
 			if (ret)
 				return ret;
 		}
@@ -202,8 +202,8 @@ static void submit_attach_object_fences(struct etnaviv_gem_submit *submit)
 			dma_resv_add_excl_fence(obj->resv,
 							  submit->out_fence);
 		else
-			dma_resv_add_shared_fence(obj->resv,
-							    submit->out_fence);
+			dma_resv_fences_add(&obj->resv->readers,
+					    submit->out_fence);
 
 		submit_unlock_object(submit, i);
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
index 3d4f5775a4ba..0f18ba97b228 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
@@ -4,6 +4,8 @@
  * Copyright © 2014-2016 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include "gt/intel_engine.h"
 
 #include "i915_gem_ioctls.h"
@@ -81,8 +83,9 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		    struct drm_file *file)
 {
 	struct drm_i915_gem_busy *args = data;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	struct drm_i915_gem_object *obj;
-	struct dma_resv_list *list;
 	unsigned int seq;
 	int err;
 
@@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
 
 	/* Translate shared fences to READ set of engines */
-	list = rcu_dereference(obj->base.resv->fence);
-	if (list) {
-		unsigned int shared_count = list->shared_count, i;
-
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence =
-				rcu_dereference(list->shared[i]);
-
-			args->busy |= busy_check_reader(fence);
-		}
-	}
+	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
+	dma_fence_array_for_each(fence, cursor, readers)
+		args->busy |= busy_check_reader(fence);
+	dma_fence_put(readers);
 
 	if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq))
 		goto retry;
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 274cf5b19fc9..b0f760c070b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -112,7 +112,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			 * trapping a reference to all the old fences, rather
 			 * than replace the existing fence.
 			 */
-			if (rcu_access_pointer(resv->fence)) {
+			if (rcu_access_pointer(resv->readers.fence)) {
 				dma_resv_lock(resv, NULL);
 				dma_resv_prune_fences(resv);
 				dma_resv_unlock(resv);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index fcbe433a968c..267be92c4c7a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -911,8 +911,8 @@ static void export_fence(struct i915_vma *vma,
 	 */
 	if (flags & EXEC_OBJECT_WRITE)
 		dma_resv_add_excl_fence(resv, &rq->fence);
-	else if (dma_resv_reserve_shared(resv, 1) == 0)
-		dma_resv_add_shared_fence(resv, &rq->fence);
+	else if (dma_resv_fences_reserve(resv, &resv->readers, 1) == 0)
+		dma_resv_fences_add(&resv->readers, &rq->fence);
 }
 
 int i915_vma_move_to_active(struct i915_vma *vma,
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index ff3d9acc24fc..1d5ef3d6d3e2 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -136,7 +136,7 @@ static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo,
 	int err = 0;
 
 	if (!write) {
-		err = dma_resv_reserve_shared(bo->gem.resv, 1);
+		err = dma_resv_fences_reserve(&bo->gem.resv->readers, 1);
 		if (err)
 			return err;
 	}
@@ -298,7 +298,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 		if (submit->bos[i].flags & LIMA_SUBMIT_BO_WRITE)
 			dma_resv_add_excl_fence(bos[i]->gem.resv, fence);
 		else
-			dma_resv_add_shared_fence(bos[i]->gem.resv, fence);
+			dma_resv_fences_add(&bos[i]->gem.resv->readers, fence);
 	}
 
 	lima_gem_unlock_bos(bos, submit->nr_bos, &ctx);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 90e3dc3b927a..3d9a1d53c309 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -663,11 +663,10 @@ void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass)
 int msm_gem_sync_object(struct drm_gem_object *obj,
 		struct msm_fence_context *fctx, bool exclusive)
 {
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	int i, ret;
 
-	fobj = dma_resv_get_list(obj->resv);
 	fence = dma_resv_get_excl(obj->resv);
 	/* don't need to wait on our own fences, since ring is fifo */
 	if (fence && (fence->context != fctx->context)) {
@@ -676,12 +675,11 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
 			return ret;
 	}
 
-	if (!exclusive || !fobj)
+	if (!exclusive)
 		return 0;
 
-	for (i = 0; i < fobj->shared_count; i++) {
-		fence = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(obj->resv));
+	readers = dma_resv_fences_deref(obj->resv, &obj->res->readers);
+	dma_fence_array_for_each(fence, cursor, readers) {
 		if (fence->context != fctx->context) {
 			ret = dma_fence_wait(fence, true);
 			if (ret)
@@ -701,7 +699,7 @@ void msm_gem_move_to_active(struct drm_gem_object *obj,
 	if (exclusive)
 		dma_resv_add_excl_fence(obj->resv, fence);
 	else
-		dma_resv_add_shared_fence(obj->resv, fence);
+		dma_resv_fences_add(&obj->resv->readers, fence);
 	list_del_init(&msm_obj->mm_list);
 	list_add_tail(&msm_obj->mm_list, &gpu->active_list);
 }
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 2e1556b7af26..4d038d2004fb 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -225,8 +225,8 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
 			 * strange place to call it.  OTOH this is a
 			 * convenient can-fail point to hook it in.
 			 */
-			ret = dma_resv_reserve_shared(msm_obj->base.resv,
-								1);
+			ret = dma_resv_fences_reserve(&msm_obj->base.resv->readers,
+						      1);
 			if (ret)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e0b1bbee936f..6b3fad03b342 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1660,7 +1660,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence, bool excl
 	if (exclusive)
 		dma_resv_add_excl_fence(resv, &fence->base);
 	else if (fence)
-		dma_resv_add_shared_fence(resv, &fence->base);
+		dma_resv_fences_add(&resv->readers, &fence->base);
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 42ddddbb49e4..350cd2a9ea51 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -28,6 +28,7 @@
 
 #include <linux/ktime.h>
 #include <linux/hrtimer.h>
+#include <linux/dma-fence-array.h>
 #include <trace/events/dma_fence.h>
 
 #include <nvif/cl826e.h>
@@ -334,22 +335,20 @@ int
 nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool exclusive, bool intr)
 {
 	struct nouveau_fence_chan *fctx = chan->fence;
-	struct dma_fence *fence;
 	struct dma_resv *resv = nvbo->bo.base.resv;
-	struct dma_resv_list *fobj;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	struct nouveau_fence *f;
-	int ret = 0, i;
+	int ret = 0;
 
 	if (!exclusive) {
-		ret = dma_resv_reserve_shared(resv, 1);
+		ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 
 		if (ret)
 			return ret;
 	}
 
-	fobj = dma_resv_get_list(resv);
 	fence = dma_resv_get_excl(resv);
-
 	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
@@ -369,16 +368,14 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 		return ret;
 	}
 
-	if (!exclusive || !fobj)
+	if (!exclusive)
 		return ret;
 
-	for (i = 0; i < fobj->shared_count && !ret; ++i) {
+	readers = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(fence, cursor, readers) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
-		fence = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(resv));
-
 		f = nouveau_local_fence(fence, chan->drm);
 		if (f) {
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index a4f4175bbdbe..89b91793a657 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -28,6 +28,8 @@
  *  Alon Levy <alevy@redhat.com>
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/drm_debugfs.h>
 #include <drm/drm_file.h>
 
@@ -57,12 +59,15 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
 	struct qxl_bo *bo;
 
 	list_for_each_entry(bo, &qdev->gem.objects, list) {
-		struct dma_resv_list *fobj;
+		struct dma_resv *resv = bo->tbo.base.resv;
+		struct dma_fence_array *array;
+		struct dma_fence *readers;
 		int rel;
 
 		rcu_read_lock();
-		fobj = rcu_dereference(bo->tbo.base.resv->fence);
-		rel = fobj ? fobj->shared_count : 0;
+		readers = dma_resv_fences_deref(resv, &resv->readers);
+		array = to_dma_fence_array(readers);
+		rel = array ? array->num_fences : readers ? 1 : 0;
 		rcu_read_unlock();
 
 		seq_printf(m, "size %ld, pc %d, num releases %d\n",
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 312216caeea2..b73cbbc60167 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -229,6 +229,7 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 static int qxl_release_validate_bo(struct qxl_bo *bo)
 {
 	struct ttm_operation_ctx ctx = { true, false };
+	struct dma_resv *resv = bo->tbo.base.resv;
 	int ret;
 
 	if (!bo->pin_count) {
@@ -238,7 +239,7 @@ static int qxl_release_validate_bo(struct qxl_bo *bo)
 			return ret;
 	}
 
-	ret = dma_resv_reserve_shared(bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 	if (ret)
 		return ret;
 
@@ -458,7 +459,7 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 	list_for_each_entry(entry, &release->bos, head) {
 		bo = entry->bo;
 
-		dma_resv_add_shared_fence(bo->base.resv, &release->base);
+		dma_resv_fences_add(&bo->base.resv->readers, &release->base);
 		ttm_bo_add_to_lru(bo);
 		dma_resv_unlock(bo->base.resv);
 	}
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 2abe1eab471f..37abde91a79b 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -873,7 +873,7 @@ void radeon_bo_fence(struct radeon_bo *bo, struct radeon_fence *fence,
 	struct dma_resv *resv = bo->tbo.base.resv;
 
 	if (shared)
-		dma_resv_add_shared_fence(resv, &fence->base);
+		dma_resv_fences_add(&resv->readers, &fence->base);
 	else
 		dma_resv_add_excl_fence(resv, &fence->base);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_sync.c b/drivers/gpu/drm/radeon/radeon_sync.c
index 55cc77a73c7b..971dd6c01e60 100644
--- a/drivers/gpu/drm/radeon/radeon_sync.c
+++ b/drivers/gpu/drm/radeon/radeon_sync.c
@@ -27,6 +27,7 @@
  * Authors:
  *    Christian König <christian.koenig@amd.com>
  */
+#include <linux/dma-fence-array.h>
 
 #include "radeon.h"
 #include "radeon_trace.h"
@@ -90,10 +91,9 @@ int radeon_sync_resv(struct radeon_device *rdev,
 		     struct dma_resv *resv,
 		     bool shared)
 {
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *readers;
 	struct radeon_fence *fence;
-	unsigned i;
 	int r = 0;
 
 	/* always sync to the exclusive fence */
@@ -104,13 +104,11 @@ int radeon_sync_resv(struct radeon_device *rdev,
 	else if (f)
 		r = dma_fence_wait(f, true);
 
-	flist = dma_resv_get_list(resv);
-	if (shared || !flist || r)
+	if (shared || r)
 		return r;
 
-	for (i = 0; i < flist->shared_count; ++i) {
-		f = rcu_dereference_protected(flist->shared[i],
-					      dma_resv_held(resv));
+	readers = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(f, cursor, readers) {
 		fence = to_radeon_fence(f);
 		if (fence && fence->rdev == rdev)
 			radeon_sync_fence(sync, fence);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index e0ad547786e8..c4c474e283bd 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -826,12 +826,13 @@ static int radeon_vm_update_ptes(struct radeon_device *rdev,
 	for (addr = start; addr < end; ) {
 		uint64_t pt_idx = addr >> radeon_vm_block_size;
 		struct radeon_bo *pt = vm->page_tables[pt_idx].bo;
+		struct dma_resv *resv = pt->tbo.base.resv;
 		unsigned nptes;
 		uint64_t pte;
 		int r;
 
 		radeon_sync_resv(rdev, &ib->sync, pt->tbo.base.resv, true);
-		r = dma_resv_reserve_shared(pt->tbo.base.resv, 1);
+		r = dma_resv_fences_reserve(resv, &resv->readers, 1);
 		if (r)
 			return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f78f52cc2e6d..19eeefbbb65a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -453,22 +453,15 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
 
 static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
 {
-	struct dma_resv_list *fobj;
 	struct dma_fence *fence;
-	int i;
 
-	fobj = dma_resv_get_list(&bo->base._resv);
 	fence = dma_resv_get_excl(&bo->base._resv);
 	if (fence && !fence->ops->signaled)
 		dma_fence_enable_sw_signaling(fence);
 
-	for (i = 0; fobj && i < fobj->shared_count; ++i) {
-		fence = rcu_dereference_protected(fobj->shared[i],
-					dma_resv_held(bo->base.resv));
-
-		if (!fence->ops->signaled)
-			dma_fence_enable_sw_signaling(fence);
-	}
+	fence = dma_resv_fences_deref(&bo->base._resv, &bo->base._resv.readers);
+	if (fence && !fence->ops->signaled)
+		dma_fence_enable_sw_signaling(fence);
 }
 
 static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
@@ -932,9 +925,11 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 	spin_unlock(&man->move_lock);
 
 	if (fence) {
-		dma_resv_add_shared_fence(bo->base.resv, fence);
+		struct dma_resv *resv = bo->base.resv;
+
+		dma_resv_fences_add(&resv->readers, fence);
 
-		ret = dma_resv_reserve_shared(bo->base.resv, 1);
+		ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 		if (unlikely(ret)) {
 			dma_fence_put(fence);
 			return ret;
@@ -1091,7 +1086,8 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 	bool type_found = false;
 	int i, ret;
 
-	ret = dma_resv_reserve_shared(bo->base.resv, 1);
+	ret = dma_resv_fences_reserve(bo->base.resv,
+				      &bo->base.resv->readers, 1);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 131dae8f4170..b9a2778df67c 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -130,8 +130,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 			if (!entry->num_shared)
 				continue;
 
-			ret = dma_resv_reserve_shared(bo->base.resv,
-								entry->num_shared);
+			ret = dma_resv_fences_reserve(bo->base.resv,
+						      &bo->base.resv->readers,
+						      entry->num_shared);
 			if (!ret)
 				continue;
 		}
@@ -153,8 +154,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 		}
 
 		if (!ret && entry->num_shared)
-			ret = dma_resv_reserve_shared(bo->base.resv,
-								entry->num_shared);
+			ret = dma_resv_fences_reserve(bo->base.resv,
+						      &bo->base.resv->readers,
+						      entry->num_shared);
 
 		if (unlikely(ret != 0)) {
 			if (ret == -EINTR)
@@ -201,7 +203,7 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 	list_for_each_entry(entry, list, head) {
 		bo = entry->bo;
 		if (entry->num_shared)
-			dma_resv_add_shared_fence(bo->base.resv, fence);
+			dma_resv_fences_add(&bo->base.resv->readers, fence);
 		else
 			dma_resv_add_excl_fence(bo->base.resv, fence);
 		if (list_empty(&bo->lru))
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 7a06cb6e31c5..253f832c29d5 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -543,7 +543,7 @@ vc4_update_bo_seqnos(struct vc4_exec_info *exec, uint64_t seqno)
 		bo = to_vc4_bo(&exec->bo[i]->base);
 		bo->seqno = seqno;
 
-		dma_resv_add_shared_fence(bo->base.base.resv, exec->fence);
+		dma_resv_fences_add(&bo->base.base.resv->readers, exec->fence);
 	}
 
 	list_for_each_entry(bo, &exec->unref_list, unref_head) {
@@ -642,7 +642,7 @@ vc4_lock_bo_reservations(struct drm_device *dev,
 	for (i = 0; i < exec->bo_count; i++) {
 		bo = &exec->bo[i]->base;
 
-		ret = dma_resv_reserve_shared(bo->resv, 1);
+		ret = dma_resv_fences_reserve(&bo->resv->readers, 1);
 		if (ret) {
 			vc4_unlock_bo_reservations(dev, exec, acquire_ctx);
 			return ret;
diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
index 9268f6fc3f66..deeef3d4deba 100644
--- a/drivers/gpu/drm/vgem/vgem_fence.c
+++ b/drivers/gpu/drm/vgem/vgem_fence.c
@@ -162,8 +162,8 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
 	dma_resv_lock(resv, NULL);
 	if (arg->flags & VGEM_FENCE_WRITE)
 		dma_resv_add_excl_fence(resv, fence);
-	else if ((ret = dma_resv_reserve_shared(resv, 1)) == 0)
-		dma_resv_add_shared_fence(resv, fence);
+	else if ((ret = dma_resv_fences_reserve(resv, &resv->readers, 1)) == 0)
+		dma_resv_fences_add(&resv->readers, fence);
 	dma_resv_unlock(resv);
 
 	/* Record the fence in our idr for later signaling */
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index c70f13fa6789..b23e16975f39 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -72,32 +72,19 @@ void dma_resv_fences_add(struct dma_resv_fences *fences,
 void dma_resv_fences_commit(struct dma_resv *obj,
 			    struct dma_resv_fences *fences);
 
-/**
- * struct dma_resv_list - a list of shared fences
- * @rcu: for internal use
- * @shared_count: table of shared fences
- * @shared_max: for growing shared fence table
- * @shared: shared fence table
- */
-struct dma_resv_list {
-	struct rcu_head rcu;
-	u32 shared_count, shared_max;
-	struct dma_fence __rcu *shared[];
-};
-
 /**
  * struct dma_resv - a reservation object manages fences for a buffer
  * @lock: update side lock
  * @seq: sequence count for managing RCU read-side synchronization
  * @fence_excl: the exclusive fence, if there is one currently
- * @fence: list of current shared fences
+ * @readers: array of read operations for implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
 	seqcount_t seq;
 
 	struct dma_fence __rcu *fence_excl;
-	struct dma_resv_list __rcu *fence;
+	struct dma_resv_fences	readers;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
@@ -129,20 +116,6 @@ dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
 	return dma_fence_get_rcu_safe(&fences->fence);
 }
 
-/**
- * dma_resv_get_list - get the reservation object's
- * shared fence list, with update-side lock held
- * @obj: the reservation object
- *
- * Returns the shared fence list.  Does NOT take references to
- * the fence.  The obj->lock must be held.
- */
-static inline struct dma_resv_list *dma_resv_get_list(struct dma_resv *obj)
-{
-	return rcu_dereference_protected(obj->fence,
-					 dma_resv_held(obj));
-}
-
 /**
  * dma_resv_lock - lock the reservation object
  * @obj: the reservation object
@@ -266,14 +239,7 @@ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj)
  */
 static inline void dma_resv_unlock(struct dma_resv *obj)
 {
-#ifdef CONFIG_DEBUG_MUTEXES
-	/* Test shared fence slot reservation */
-	if (rcu_access_pointer(obj->fence)) {
-		struct dma_resv_list *fence = dma_resv_get_list(obj);
-
-		fence->shared_max = fence->shared_count;
-	}
-#endif
+	dma_resv_fences_commit(obj, &obj->readers);
 	ww_mutex_unlock(&obj->lock);
 }
 
@@ -324,8 +290,6 @@ dma_resv_get_excl_rcu(struct dma_resv *obj)
 
 void dma_resv_init(struct dma_resv *obj);
 void dma_resv_fini(struct dma_resv *obj);
-int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
-void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
 
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
 void dma_resv_prune_fences(struct dma_resv *obj);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Replace the old shared fences with the new readers container
and change all callers accordingly.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c                     |  99 +---
 drivers/dma-buf/dma-resv.c                    | 524 +++---------------
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  58 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |  37 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c      |  17 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  19 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   3 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  18 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_busy.c      |  20 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c    |   2 +-
 drivers/gpu/drm/i915/i915_vma.c               |   4 +-
 drivers/gpu/drm/lima/lima_gem.c               |   4 +-
 drivers/gpu/drm/msm/msm_gem.c                 |  14 +-
 drivers/gpu/drm/msm/msm_gem_submit.c          |   4 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c       |  19 +-
 drivers/gpu/drm/qxl/qxl_debugfs.c             |  11 +-
 drivers/gpu/drm/qxl/qxl_release.c             |   5 +-
 drivers/gpu/drm/radeon/radeon_object.c        |   2 +-
 drivers/gpu/drm/radeon/radeon_sync.c          |  14 +-
 drivers/gpu/drm/radeon/radeon_vm.c            |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  22 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c        |  12 +-
 drivers/gpu/drm/vc4/vc4_gem.c                 |   4 +-
 drivers/gpu/drm/vgem/vgem_fence.c             |   4 +-
 include/linux/dma-resv.h                      |  42 +-
 28 files changed, 224 insertions(+), 747 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 433d91d710e4..220cfa9ca82b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -15,6 +15,7 @@
 #include <linux/slab.h>
 #include <linux/dma-buf.h>
 #include <linux/dma-fence.h>
+#include <linux/dma-fence-array.h>
 #include <linux/anon_inodes.h>
 #include <linux/export.h>
 #include <linux/debugfs.h>
@@ -194,12 +195,10 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 
 static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 {
+	struct dma_fence *fence_excl, *readers;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence_excl;
 	__poll_t events;
-	unsigned shared_count, seq;
 
 	dmabuf = file->private_data;
 	if (!dmabuf || !dmabuf->resv)
@@ -213,26 +212,15 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 	if (!events)
 		return 0;
 
-retry:
-	seq = read_seqcount_begin(&resv->seq);
 	rcu_read_lock();
+	fence_excl = dma_fence_get_rcu_safe(&resv->fence_excl);
+	readers = dma_resv_fences_get_rcu(&resv->readers);
 
-	fobj = rcu_dereference(resv->fence);
-	if (fobj)
-		shared_count = fobj->shared_count;
-	else
-		shared_count = 0;
-	fence_excl = rcu_dereference(resv->fence_excl);
-	if (read_seqcount_retry(&resv->seq, seq)) {
-		rcu_read_unlock();
-		goto retry;
-	}
-
-	if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
+	if (fence_excl && (!(events & EPOLLOUT) || !readers)) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
 		__poll_t pevents = EPOLLIN;
 
-		if (shared_count == 0)
+		if (!readers)
 			pevents |= EPOLLOUT;
 
 		spin_lock_irq(&dmabuf->poll.lock);
@@ -244,12 +232,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		spin_unlock_irq(&dmabuf->poll.lock);
 
 		if (events & pevents) {
-			if (!dma_fence_get_rcu(fence_excl)) {
-				/* force a recheck */
-				events &= ~pevents;
-				dma_buf_poll_cb(NULL, &dcb->cb);
-			} else if (!dma_fence_add_callback(fence_excl, &dcb->cb,
-							   dma_buf_poll_cb)) {
+			if (!dma_fence_add_callback(fence_excl, &dcb->cb,
+						    dma_buf_poll_cb)) {
 				events &= ~pevents;
 				dma_fence_put(fence_excl);
 			} else {
@@ -263,9 +247,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		}
 	}
 
-	if ((events & EPOLLOUT) && shared_count > 0) {
+	if ((events & EPOLLOUT) && readers) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
-		int i;
 
 		/* Only queue a new callback if no event has fired yet */
 		spin_lock_irq(&dmabuf->poll.lock);
@@ -278,32 +261,15 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		if (!(events & EPOLLOUT))
 			goto out;
 
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
-			if (!dma_fence_get_rcu(fence)) {
-				/*
-				 * fence refcount dropped to zero, this means
-				 * that fobj has been freed
-				 *
-				 * call dma_buf_poll_cb and force a recheck!
-				 */
-				events &= ~EPOLLOUT;
-				dma_buf_poll_cb(NULL, &dcb->cb);
-				break;
-			}
-			if (!dma_fence_add_callback(fence, &dcb->cb,
-						    dma_buf_poll_cb)) {
-				dma_fence_put(fence);
-				events &= ~EPOLLOUT;
-				break;
-			}
-			dma_fence_put(fence);
-		}
-
-		/* No callback queued, wake up any additional waiters. */
-		if (i == shared_count)
+		if (!dma_fence_add_callback(readers, &dcb->cb,
+					    dma_buf_poll_cb)) {
+			dma_fence_put(readers);
+			events &= ~EPOLLOUT;
+		} else {
+			/* No callback queued, wake up any additional waiters. */
+			dma_fence_put(readers);
 			dma_buf_poll_cb(NULL, &dcb->cb);
+		}
 	}
 
 out:
@@ -1154,11 +1120,10 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 	int ret;
 	struct dma_buf *buf_obj;
 	struct dma_buf_attachment *attach_obj;
+	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
-	unsigned seq;
-	int count = 0, attach_count, shared_count, i;
+	struct dma_fence *fence, *readers;
+	int count = 0, attach_count;
 	size_t size = 0;
 
 	ret = mutex_lock_interruptible(&db_list.lock);
@@ -1188,32 +1153,20 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 				buf_obj->name ?: "");
 
 		robj = buf_obj->resv;
-		while (true) {
-			seq = read_seqcount_begin(&robj->seq);
-			rcu_read_lock();
-			fobj = rcu_dereference(robj->fence);
-			shared_count = fobj ? fobj->shared_count : 0;
-			fence = rcu_dereference(robj->fence_excl);
-			if (!read_seqcount_retry(&robj->seq, seq))
-				break;
-			rcu_read_unlock();
-		}
-
+		rcu_read_lock();
+		fence = dma_resv_get_excl(robj);
+		readers = dma_resv_fences_get_rcu(&robj->readers);
 		if (fence)
 			seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
-		for (i = 0; i < shared_count; i++) {
-			fence = rcu_dereference(fobj->shared[i]);
-			if (!dma_fence_get_rcu(fence))
-				continue;
-			seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
+		dma_fence_array_for_each(fence, cursor, readers)
+			seq_printf(s, "\tReaders fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
-			dma_fence_put(fence);
-		}
+		dma_fence_put(readers);
 		rcu_read_unlock();
 
 		seq_puts(s, "\tAttached Devices:\n");
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 83033b3e8521..8723af0a7a4d 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -39,12 +39,11 @@
 /**
  * DOC: Reservation Object Overview
  *
- * The reservation object provides a mechanism to manage shared and
+ * The reservation object provides a mechanism to manage readers and
  * exclusive fences associated with a buffer.  A reservation object
  * can have attached one exclusive fence (normally associated with
- * write operations) or N shared fences (read operations).  The RCU
- * mechanism is used to protect read access to fences from locked
- * write-side updates.
+ * write operations) or N read fences.  The RCU mechanism is used to
+ * protect read access to fences from locked write-side updates.
  */
 
 DEFINE_WD_CLASS(reservation_ww_class);
@@ -235,17 +234,18 @@ void dma_resv_fences_add(struct dma_resv_fences *fences,
 	struct dma_fence *old;
 	unsigned int i;
 
-#ifndef CONFIG_DEBUG_MUTEXES
 	for (i = 0; i < staged->num_fences; ++i) {
 		old = staged->fences[i];
 
-		if (old->context == fence->context ||
-		    dma_fence_is_signaled(old)) {
+		if (old->context == fence->context
+#ifndef CONFIG_DEBUG_MUTEXES
+		    || dma_fence_is_signaled(old)
+#endif
+		    ) {
 			dma_fence_put(old);
 			goto replace;
 		}
 	}
-#endif
 
 	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
 	i = staged->num_fences++;
@@ -276,46 +276,6 @@ void dma_resv_fences_commit(struct dma_resv *obj,
 }
 EXPORT_SYMBOL(dma_resv_fences_commit);
 
-/**
- * dma_resv_list_alloc - allocate fence list
- * @shared_max: number of fences we need space for
- *
- * Allocate a new dma_resv_list and make sure to correctly initialize
- * shared_max.
- */
-static struct dma_resv_list *dma_resv_list_alloc(unsigned int shared_max)
-{
-	struct dma_resv_list *list;
-
-	list = kmalloc(offsetof(typeof(*list), shared[shared_max]), GFP_KERNEL);
-	if (!list)
-		return NULL;
-
-	list->shared_max = (ksize(list) - offsetof(typeof(*list), shared)) /
-		sizeof(*list->shared);
-
-	return list;
-}
-
-/**
- * dma_resv_list_free - free fence list
- * @list: list to free
- *
- * Free a dma_resv_list and make sure to drop all references.
- */
-static void dma_resv_list_free(struct dma_resv_list *list)
-{
-	unsigned int i;
-
-	if (!list)
-		return;
-
-	for (i = 0; i < list->shared_count; ++i)
-		dma_fence_put(rcu_dereference_protected(list->shared[i], true));
-
-	kfree_rcu(list, rcu);
-}
-
 /**
  * dma_resv_init - initialize a reservation object
  * @obj: the reservation object
@@ -326,8 +286,8 @@ void dma_resv_init(struct dma_resv *obj)
 
 	__seqcount_init(&obj->seq, reservation_seqcount_string,
 			&reservation_seqcount_class);
-	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
+	dma_resv_fences_init(&obj->readers);
 }
 EXPORT_SYMBOL(dma_resv_init);
 
@@ -337,151 +297,16 @@ EXPORT_SYMBOL(dma_resv_init);
  */
 void dma_resv_fini(struct dma_resv *obj)
 {
-	struct dma_resv_list *fobj;
-	struct dma_fence *excl;
-
 	/*
 	 * This object should be dead and all references must have
 	 * been released to it, so no need to be protected with rcu.
 	 */
-	excl = rcu_dereference_protected(obj->fence_excl, 1);
-	if (excl)
-		dma_fence_put(excl);
-
-	fobj = rcu_dereference_protected(obj->fence, 1);
-	dma_resv_list_free(fobj);
+	dma_fence_put(rcu_dereference_protected(obj->fence_excl, true));
+	dma_resv_fences_fini(&obj->readers);
 	ww_mutex_destroy(&obj->lock);
 }
 EXPORT_SYMBOL(dma_resv_fini);
 
-/**
- * dma_resv_reserve_shared - Reserve space to add shared fences to
- * a dma_resv.
- * @obj: reservation object
- * @num_fences: number of fences we want to add
- *
- * Should be called before dma_resv_add_shared_fence().  Must
- * be called with obj->lock held.
- *
- * RETURNS
- * Zero for success, or -errno
- */
-int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences)
-{
-	struct dma_resv_list *old, *new;
-	unsigned int i, j, k, max;
-
-	dma_resv_assert_held(obj);
-
-	old = dma_resv_get_list(obj);
-
-	if (old && old->shared_max) {
-		if ((old->shared_count + num_fences) <= old->shared_max)
-			return 0;
-		else
-			max = max(old->shared_count + num_fences,
-				  old->shared_max * 2);
-	} else {
-		max = 4;
-	}
-
-	new = dma_resv_list_alloc(max);
-	if (!new)
-		return -ENOMEM;
-
-	/*
-	 * no need to bump fence refcounts, rcu_read access
-	 * requires the use of kref_get_unless_zero, and the
-	 * references from the old struct are carried over to
-	 * the new.
-	 */
-	for (i = 0, j = 0, k = max; i < (old ? old->shared_count : 0); ++i) {
-		struct dma_fence *fence;
-
-		fence = rcu_dereference_protected(old->shared[i],
-						  dma_resv_held(obj));
-		if (dma_fence_is_signaled(fence))
-			RCU_INIT_POINTER(new->shared[--k], fence);
-		else
-			RCU_INIT_POINTER(new->shared[j++], fence);
-	}
-	new->shared_count = j;
-
-	/*
-	 * We are not changing the effective set of fences here so can
-	 * merely update the pointer to the new array; both existing
-	 * readers and new readers will see exactly the same set of
-	 * active (unsignaled) shared fences. Individual fences and the
-	 * old array are protected by RCU and so will not vanish under
-	 * the gaze of the rcu_read_lock() readers.
-	 */
-	rcu_assign_pointer(obj->fence, new);
-
-	if (!old)
-		return 0;
-
-	/* Drop the references to the signaled fences */
-	for (i = k; i < max; ++i) {
-		struct dma_fence *fence;
-
-		fence = rcu_dereference_protected(new->shared[i],
-						  dma_resv_held(obj));
-		dma_fence_put(fence);
-	}
-	kfree_rcu(old, rcu);
-
-	return 0;
-}
-EXPORT_SYMBOL(dma_resv_reserve_shared);
-
-/**
- * dma_resv_add_shared_fence - Add a fence to a shared slot
- * @obj: the reservation object
- * @fence: the shared fence to add
- *
- * Add a fence to a shared slot, obj->lock must be held, and
- * dma_resv_reserve_shared() has been called.
- */
-void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
-{
-	struct dma_resv_list *fobj;
-	struct dma_fence *old;
-	unsigned int i, count;
-
-	dma_fence_get(fence);
-
-	dma_resv_assert_held(obj);
-
-	fobj = dma_resv_get_list(obj);
-	count = fobj->shared_count;
-
-	preempt_disable();
-	write_seqcount_begin(&obj->seq);
-
-	for (i = 0; i < count; ++i) {
-
-		old = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(obj));
-		if (old->context == fence->context ||
-		    dma_fence_is_signaled(old))
-			goto replace;
-	}
-
-	BUG_ON(fobj->shared_count >= fobj->shared_max);
-	old = NULL;
-	count++;
-
-replace:
-	RCU_INIT_POINTER(fobj->shared[i], fence);
-	/* pointer update must be visible before we extend the shared_count */
-	smp_store_mb(fobj->shared_count, count);
-
-	write_seqcount_end(&obj->seq);
-	preempt_enable();
-	dma_fence_put(old);
-}
-EXPORT_SYMBOL(dma_resv_add_shared_fence);
-
 /**
  * dma_resv_add_excl_fence - Add an exclusive fence.
  * @obj: the reservation object
@@ -501,6 +326,7 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
+	dma_resv_fences_set(obj, &obj->readers, fence);
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
@@ -516,9 +342,7 @@ EXPORT_SYMBOL(dma_resv_add_excl_fence);
  */
 void dma_resv_prune_fences(struct dma_resv *obj)
 {
-	struct dma_resv_list *list;
 	struct dma_fence *fence;
-	unsigned int i;
 
 	dma_resv_assert_held(obj);
 
@@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 		dma_fence_put(fence);
 	}
 
-	list = dma_resv_get_list(obj);
-	if (!list)
-		return;
-
-	for (i = 0; i < list->shared_count; ++i) {
-		fence = rcu_dereference_protected(list->shared[i],
-						  dma_resv_held(obj));
-
-		if (!dma_fence_is_signaled(fence))
-			continue;
-
-		RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
-		dma_fence_put(fence);
-	}
+	fence = dma_resv_fences_deref(obj, &obj->readers);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->readers, NULL);
 }
 EXPORT_SYMBOL(dma_resv_prune_fences);
 
@@ -554,73 +367,25 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_resv_list *src_list, *dst_list;
-	struct dma_fence *old, *new;
-	unsigned i;
+	struct dma_fence *old, *excl, *readers;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
-	src_list = rcu_dereference(src->fence);
-
-retry:
-	if (src_list) {
-		unsigned shared_count = src_list->shared_count;
-
-		rcu_read_unlock();
-
-		dst_list = dma_resv_list_alloc(shared_count);
-		if (!dst_list)
-			return -ENOMEM;
-
-		rcu_read_lock();
-		src_list = rcu_dereference(src->fence);
-		if (!src_list || src_list->shared_count > shared_count) {
-			kfree(dst_list);
-			goto retry;
-		}
-
-		dst_list->shared_count = 0;
-		for (i = 0; i < src_list->shared_count; ++i) {
-			struct dma_fence *fence;
-
-			fence = rcu_dereference(src_list->shared[i]);
-			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				     &fence->flags))
-				continue;
-
-			if (!dma_fence_get_rcu(fence)) {
-				dma_resv_list_free(dst_list);
-				src_list = rcu_dereference(src->fence);
-				goto retry;
-			}
-
-			if (dma_fence_is_signaled(fence)) {
-				dma_fence_put(fence);
-				continue;
-			}
-
-			rcu_assign_pointer(dst_list->shared[dst_list->shared_count++], fence);
-		}
-	} else {
-		dst_list = NULL;
-	}
-
-	new = dma_fence_get_rcu_safe(&src->fence_excl);
+	excl = dma_fence_get_rcu_safe(&src->fence_excl);
+	readers = dma_resv_fences_get_rcu(&src->readers);
 	rcu_read_unlock();
 
-	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
 	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(dst->fence_excl, new);
-	RCU_INIT_POINTER(dst->fence, dst_list);
+	RCU_INIT_POINTER(dst->fence_excl, excl);
+	dma_resv_fences_set(dst, &dst->readers, readers);
 	write_seqcount_end(&dst->seq);
 	preempt_enable();
 
-	dma_resv_list_free(src_list);
 	dma_fence_put(old);
 
 	return 0;
@@ -628,7 +393,7 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 EXPORT_SYMBOL(dma_resv_copy_fences);
 
 /**
- * dma_resv_get_fences_rcu - Get an object's shared and exclusive
+ * dma_resv_get_fences_rcu - Get an object's readers and exclusive
  * fences without update side lock held
  * @obj: the reservation object
  * @pfence_excl: the returned exclusive fence (or NULL)
@@ -645,92 +410,51 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    unsigned *pshared_count,
 			    struct dma_fence ***pshared)
 {
-	struct dma_fence **shared = NULL;
-	struct dma_fence *fence_excl;
+	struct dma_fence *excl, *readers;
+	struct dma_fence **shared;
 	unsigned int shared_count;
-	int ret = 1;
-
-	do {
-		struct dma_resv_list *fobj;
-		unsigned int i, seq;
-		size_t sz = 0;
-
-		shared_count = i = 0;
-
-		rcu_read_lock();
-		seq = read_seqcount_begin(&obj->seq);
-
-		fence_excl = rcu_dereference(obj->fence_excl);
-		if (fence_excl && !dma_fence_get_rcu(fence_excl))
-			goto unlock;
-
-		fobj = rcu_dereference(obj->fence);
-		if (fobj)
-			sz += sizeof(*shared) * fobj->shared_max;
-
-		if (!pfence_excl && fence_excl)
-			sz += sizeof(*shared);
-
-		if (sz) {
-			struct dma_fence **nshared;
-
-			nshared = krealloc(shared, sz,
-					   GFP_NOWAIT | __GFP_NOWARN);
-			if (!nshared) {
-				rcu_read_unlock();
-
-				dma_fence_put(fence_excl);
-				fence_excl = NULL;
-
-				nshared = krealloc(shared, sz, GFP_KERNEL);
-				if (nshared) {
-					shared = nshared;
-					continue;
-				}
-
-				ret = -ENOMEM;
-				break;
-			}
-			shared = nshared;
-			shared_count = fobj ? fobj->shared_count : 0;
-			for (i = 0; i < shared_count; ++i) {
-				shared[i] = rcu_dereference(fobj->shared[i]);
-				if (!dma_fence_get_rcu(shared[i]))
-					break;
-			}
-		}
 
-		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
-			while (i--)
-				dma_fence_put(shared[i]);
-			dma_fence_put(fence_excl);
-			goto unlock;
-		}
+	rcu_read_lock();
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
+	rcu_read_unlock();
 
-		ret = 0;
-unlock:
-		rcu_read_unlock();
-	} while (ret);
+	shared_count = readers ? 1 : 0;
+	if (excl && !pfence_excl)
+		++shared_count;
 
-	if (pfence_excl)
-		*pfence_excl = fence_excl;
-	else if (fence_excl)
-		shared[++shared_count] = fence_excl;
+	if (shared_count) {
+		shared = kmalloc_array(shared_count, sizeof(*shared),
+				       GFP_KERNEL);
+		if (!shared) {
+			dma_fence_put(excl);
+			dma_fence_put(readers);
+			return -ENOMEM;
+		}
 
-	if (!shared_count) {
-		kfree(shared);
-		shared = NULL;
+		shared_count = 0;
+		if (readers)
+			shared[shared_count++] = readers;
+		if (excl && !pfence_excl)
+			shared[shared_count++] = excl;
+
+		*pshared = shared;
+		*pshared_count = shared_count;
+	} else {
+		*pshared = NULL;
+		*pshared_count = 0;
 	}
 
-	*pshared_count = shared_count;
-	*pshared = shared;
-	return ret;
+	if (pfence_excl)
+		*pfence_excl = excl;
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(dma_resv_get_fences_rcu);
 
 /**
  * dma_resv_wait_timeout_rcu - Wait on reservation's objects
- * shared and/or exclusive fences.
+ * readers and/or exclusive fences.
  * @obj: the reservation object
  * @wait_all: if true, wait on all fences, else wait on just exclusive fence
  * @intr: if true, do interruptible wait
@@ -744,93 +468,29 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *fence;
-	unsigned seq, shared_count;
+	struct dma_fence *excl, *readers;
 	long ret = timeout ? timeout : 1;
-	int i;
 
-retry:
-	shared_count = 0;
-	seq = read_seqcount_begin(&obj->seq);
 	rcu_read_lock();
-	i = -1;
-
-	fence = rcu_dereference(obj->fence_excl);
-	if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
-		if (!dma_fence_get_rcu(fence))
-			goto unlock_retry;
-
-		if (dma_fence_is_signaled(fence)) {
-			dma_fence_put(fence);
-			fence = NULL;
-		}
-
-	} else {
-		fence = NULL;
-	}
-
-	if (wait_all) {
-		struct dma_resv_list *fobj = rcu_dereference(obj->fence);
-
-		if (fobj)
-			shared_count = fobj->shared_count;
-
-		for (i = 0; !fence && i < shared_count; ++i) {
-			struct dma_fence *lfence = rcu_dereference(fobj->shared[i]);
-
-			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				     &lfence->flags))
-				continue;
-
-			if (!dma_fence_get_rcu(lfence))
-				goto unlock_retry;
-
-			if (dma_fence_is_signaled(lfence)) {
-				dma_fence_put(lfence);
-				continue;
-			}
-
-			fence = lfence;
-			break;
-		}
-	}
-
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
-	if (fence) {
-		if (read_seqcount_retry(&obj->seq, seq)) {
-			dma_fence_put(fence);
-			goto retry;
-		}
 
-		ret = dma_fence_wait_timeout(fence, intr, ret);
-		dma_fence_put(fence);
-		if (ret > 0 && wait_all && (i + 1 < shared_count))
-			goto retry;
+	if (wait_all && readers) {
+		ret = dma_fence_wait_timeout(readers, intr, ret);
+		if (ret <= 0)
+			goto out;
 	}
-	return ret;
-
-unlock_retry:
-	rcu_read_unlock();
-	goto retry;
-}
-EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
-
-
-static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
-{
-	struct dma_fence *fence, *lfence = passed_fence;
-	int ret = 1;
 
-	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
-		fence = dma_fence_get_rcu(lfence);
-		if (!fence)
-			return -1;
+	if (excl)
+		ret = dma_fence_wait_timeout(excl, intr, ret);
 
-		ret = !!dma_fence_is_signaled(fence);
-		dma_fence_put(fence);
-	}
+out:
+	dma_fence_put(excl);
+	dma_fence_put(readers);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
 
 /**
  * dma_resv_test_signaled_rcu - Test if a reservation object's
@@ -844,51 +504,23 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	unsigned seq, shared_count;
-	int ret;
+	struct dma_fence *excl, *readers;
+	bool ret = true;
 
 	rcu_read_lock();
-retry:
-	ret = true;
-	shared_count = 0;
-	seq = read_seqcount_begin(&obj->seq);
-
-	if (test_all) {
-		unsigned i;
-
-		struct dma_resv_list *fobj = rcu_dereference(obj->fence);
-
-		if (fobj)
-			shared_count = fobj->shared_count;
-
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
-			ret = dma_resv_test_signaled_single(fence);
-			if (ret < 0)
-				goto retry;
-			else if (!ret)
-				break;
-		}
-
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto retry;
-	}
+	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	readers = dma_resv_fences_get_rcu(&obj->readers);
+	rcu_read_unlock();
 
-	if (!shared_count) {
-		struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
+	if (excl)
+		ret = dma_fence_is_signaled(excl);
 
-		if (fence_excl) {
-			ret = dma_resv_test_signaled_single(fence_excl);
-			if (ret < 0)
-				goto retry;
+	if (test_all && readers)
+		ret &= dma_fence_is_signaled(readers);
 
-			if (read_seqcount_retry(&obj->seq, seq))
-				goto retry;
-		}
-	}
+	dma_fence_put(excl);
+	dma_fence_put(readers);
 
-	rcu_read_unlock();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_test_signaled_rcu);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 76e3516484e7..ec4f29934b09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -23,6 +23,7 @@
 #define pr_fmt(fmt) "kfd2kgd: " fmt
 
 #include <linux/dma-buf.h>
+#include <linux/dma-fence-array.h>
 #include <linux/list.h>
 #include <linux/pagemap.h>
 #include <linux/sched/mm.h>
@@ -219,54 +220,27 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 					struct amdgpu_amdkfd_fence *ef)
 {
 	struct dma_resv *resv = bo->tbo.base.resv;
-	struct dma_resv_list *old, *new;
-	unsigned int i, j, k;
+	unsigned int i;
+	int r;
 
 	if (!ef)
 		return -EINVAL;
 
-	old = dma_resv_get_list(resv);
-	if (!old)
-		return 0;
-
-	new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
-		      GFP_KERNEL);
-	if (!new)
-		return -ENOMEM;
+	r = dma_resv_fences_reserve(resv, &resv->readers, 0);
+	if (r)
+		return r;
 
-	/* Go through all the shared fences in the resevation object and sort
-	 * the interesting ones to the end of the list.
-	 */
-	for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
+	for (i = 0; i < resv->readers.staged->num_fences; ++i) {
 		struct dma_fence *f;
 
-		f = rcu_dereference_protected(old->shared[i],
-					      dma_resv_held(resv));
-
-		if (f->context == ef->base.context)
-			RCU_INIT_POINTER(new->shared[--j], f);
-		else
-			RCU_INIT_POINTER(new->shared[k++], f);
+		f = resv->readers.staged->fences[i];
+		if (f->context == ef->base.context) {
+			resv->readers.staged->fences[i] = dma_fence_get_stub();
+			dma_fence_put(f);
+		}
 	}
-	new->shared_max = old->shared_max;
-	new->shared_count = k;
-
-	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
-	write_seqcount_begin(&resv->seq);
-	RCU_INIT_POINTER(resv->fence, new);
-	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
-	/* Drop the references to the removed fences or move them to ef_list */
-	for (i = j, k = 0; i < old->shared_count; ++i) {
-		struct dma_fence *f;
-
-		f = rcu_dereference_protected(new->shared[i],
-					      dma_resv_held(resv));
-		dma_fence_put(f);
-	}
-	kfree_rcu(old, rcu);
+	dma_resv_fences_commit(resv, &resv->readers);
 
 	return 0;
 }
@@ -840,6 +814,7 @@ static int process_update_pds(struct amdkfd_process_info *process_info,
 static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 		       struct dma_fence **ef)
 {
+	struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;
 	struct amdkfd_process_info *info = NULL;
 	int ret;
 
@@ -887,7 +862,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 				  AMDGPU_FENCE_OWNER_KFD, false);
 	if (ret)
 		goto wait_pd_fail;
-	ret = dma_resv_reserve_shared(vm->root.base.bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 	if (ret)
 		goto reserve_shared_fail;
 	amdgpu_bo_fence(vm->root.base.bo,
@@ -2132,7 +2107,8 @@ int amdgpu_amdkfd_add_gws_to_process(void *info, void *gws, struct kgd_mem **mem
 	 * Add process eviction fence to bo so they can
 	 * evict each other.
 	 */
-	ret = dma_resv_reserve_shared(gws_bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(gws_bo->tbo.base.resv,
+				      &gws_bo->tbo.base.resv->readers, 1);
 	if (ret)
 		goto reserve_shared_fail;
 	amdgpu_bo_fence(gws_bo, &process_info->eviction_fence->base, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index bf0f00508987..3e4685565f82 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -139,43 +139,12 @@ int amdgpu_gem_prime_mmap(struct drm_gem_object *obj,
 static int
 __dma_resv_make_exclusive(struct dma_resv *obj)
 {
-	struct dma_fence **fences;
-	unsigned int count;
-	int r;
+	struct dma_fence *readers = dma_fence_get_rcu(obj->readers.fence);
 
-	if (!dma_resv_get_list(obj)) /* no shared fences to convert */
-		return 0;
-
-	r = dma_resv_get_fences_rcu(obj, NULL, &count, &fences);
-	if (r)
-		return r;
-
-	if (count == 0) {
-		/* Now that was unexpected. */
-	} else if (count == 1) {
-		dma_resv_add_excl_fence(obj, fences[0]);
-		dma_fence_put(fences[0]);
-		kfree(fences);
-	} else {
-		struct dma_fence_array *array;
-
-		array = dma_fence_array_create(count, fences,
-					       dma_fence_context_alloc(1), 0,
-					       false);
-		if (!array)
-			goto err_fences_put;
-
-		dma_resv_add_excl_fence(obj, &array->base);
-		dma_fence_put(&array->base);
-	}
+	dma_resv_add_excl_fence(obj, readers);
+	dma_fence_put(readers);
 
 	return 0;
-
-err_fences_put:
-	while (count--)
-		dma_fence_put(fences[count]);
-	kfree(fences);
-	return -ENOMEM;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2f11ebd95528..a2dddd8d897c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1286,7 +1286,7 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
 	struct dma_resv *resv = bo->tbo.base.resv;
 
 	if (shared)
-		dma_resv_add_shared_fence(resv, fence);
+		dma_resv_fences_add(&resv->readers, fence);
 	else
 		dma_resv_add_excl_fence(resv, fence);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 95e5e93edd18..f8bc5baf0a44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -28,6 +28,8 @@
  *    Christian König <christian.koenig@amd.com>
  */
 
+#include <linux/dma-fence-array.h>
+
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -193,10 +195,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 		     struct dma_resv *resv,
 		     void *owner, bool explicit_sync)
 {
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *array;
 	void *fence_owner;
-	unsigned i;
 	int r = 0;
 
 	if (resv == NULL)
@@ -205,14 +206,12 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 	/* always sync to the exclusive fence */
 	f = dma_resv_get_excl(resv);
 	r = amdgpu_sync_fence(adev, sync, f, false);
-
-	flist = dma_resv_get_list(resv);
-	if (!flist || r)
+	if (r)
 		return r;
 
-	for (i = 0; i < flist->shared_count; ++i) {
-		f = rcu_dereference_protected(flist->shared[i],
-					      dma_resv_held(resv));
+	array = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(f, cursor, array) {
+
 		/* We only want to trigger KFD eviction fences on
 		 * evict or move jobs. Skip KFD fences otherwise.
 		 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index fb09314bcfd4..3c2efd14f945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -39,6 +39,7 @@
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/swiotlb.h>
+#include <linux/dma-fence-array.h>
 
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
@@ -1470,9 +1471,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	unsigned long num_pages = bo->mem.num_pages;
 	struct drm_mm_node *node = bo->mem.mm_node;
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
-	int i;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *array;
 
 	/* Don't evict VM page tables while they are busy, otherwise we can't
 	 * cleanly handle page faults.
@@ -1485,15 +1485,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	 * If true, then return false as any KFD process needs all its BOs to
 	 * be resident to run successfully
 	 */
-	flist = dma_resv_get_list(bo->base.resv);
-	if (flist) {
-		for (i = 0; i < flist->shared_count; ++i) {
-			f = rcu_dereference_protected(flist->shared[i],
-				dma_resv_held(bo->base.resv));
-			if (amdkfd_fence_check_mm(f, current->mm))
-				return false;
-		}
-	}
+	array = dma_resv_fences_deref(bo->base.resv, &bo->base.resv->readers);
+	dma_fence_array_for_each(f, cursor, array)
+		if (amdkfd_fence_check_mm(f, current->mm))
+			return false;
 
 	switch (bo->mem.mem_type) {
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f555843daef0..c269d4c169c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2724,7 +2724,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		goto error_free_root;
 
-	r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
+	r = dma_resv_fences_reserve(root->tbo.base.resv,
+				    &root->tbo.base.resv->readers, 1);
 	if (r)
 		goto error_unreserve;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 7d83e04ec36e..15f36ccd6e53 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -7,6 +7,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/task.h>
+#include <linux/dma-fence-array.h>
 
 #include "etnaviv_drv.h"
 #include "etnaviv_gem.h"
@@ -459,9 +460,9 @@ static void etnaviv_gem_describe_fence(struct dma_fence *fence,
 static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 {
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj = obj->resv;
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
+	struct dma_fence *fence, *readers;
 	unsigned long off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08lx %p %zd\n",
@@ -470,16 +471,9 @@ static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 			off, etnaviv_obj->vaddr, obj->size);
 
 	rcu_read_lock();
-	fobj = rcu_dereference(robj->fence);
-	if (fobj) {
-		unsigned int i, shared_count = fobj->shared_count;
-
-		for (i = 0; i < shared_count; i++) {
-			fence = rcu_dereference(fobj->shared[i]);
-			etnaviv_gem_describe_fence(fence, "Shared", m);
-		}
-	}
-
+	readers = dma_resv_fences_deref(robj, &robj->readers);
+	dma_fence_array_for_each(fence, cursor, readers)
+		etnaviv_gem_describe_fence(fence, "Shared", m);
 	fence = rcu_dereference(robj->fence_excl);
 	if (fence)
 		etnaviv_gem_describe_fence(fence, "Exclusive", m);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 998c96b40d8a..5faab8f0e67f 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -168,7 +168,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
 		struct dma_resv *robj = bo->obj->base.resv;
 
 		if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
-			ret = dma_resv_reserve_shared(robj, 1);
+			ret = dma_resv_fences_reserve(robj, &robj->readers, 1);
 			if (ret)
 				return ret;
 		}
@@ -202,8 +202,8 @@ static void submit_attach_object_fences(struct etnaviv_gem_submit *submit)
 			dma_resv_add_excl_fence(obj->resv,
 							  submit->out_fence);
 		else
-			dma_resv_add_shared_fence(obj->resv,
-							    submit->out_fence);
+			dma_resv_fences_add(&obj->resv->readers,
+					    submit->out_fence);
 
 		submit_unlock_object(submit, i);
 	}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
index 3d4f5775a4ba..0f18ba97b228 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
@@ -4,6 +4,8 @@
  * Copyright © 2014-2016 Intel Corporation
  */
 
+#include <linux/dma-fence-array.h>
+
 #include "gt/intel_engine.h"
 
 #include "i915_gem_ioctls.h"
@@ -81,8 +83,9 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		    struct drm_file *file)
 {
 	struct drm_i915_gem_busy *args = data;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	struct drm_i915_gem_object *obj;
-	struct dma_resv_list *list;
 	unsigned int seq;
 	int err;
 
@@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
 
 	/* Translate shared fences to READ set of engines */
-	list = rcu_dereference(obj->base.resv->fence);
-	if (list) {
-		unsigned int shared_count = list->shared_count, i;
-
-		for (i = 0; i < shared_count; ++i) {
-			struct dma_fence *fence =
-				rcu_dereference(list->shared[i]);
-
-			args->busy |= busy_check_reader(fence);
-		}
-	}
+	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
+	dma_fence_array_for_each(fence, cursor, readers)
+		args->busy |= busy_check_reader(fence);
+	dma_fence_put(readers);
 
 	if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq))
 		goto retry;
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 274cf5b19fc9..b0f760c070b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -112,7 +112,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			 * trapping a reference to all the old fences, rather
 			 * than replace the existing fence.
 			 */
-			if (rcu_access_pointer(resv->fence)) {
+			if (rcu_access_pointer(resv->readers.fence)) {
 				dma_resv_lock(resv, NULL);
 				dma_resv_prune_fences(resv);
 				dma_resv_unlock(resv);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index fcbe433a968c..267be92c4c7a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -911,8 +911,8 @@ static void export_fence(struct i915_vma *vma,
 	 */
 	if (flags & EXEC_OBJECT_WRITE)
 		dma_resv_add_excl_fence(resv, &rq->fence);
-	else if (dma_resv_reserve_shared(resv, 1) == 0)
-		dma_resv_add_shared_fence(resv, &rq->fence);
+	else if (dma_resv_fences_reserve(resv, &resv->readers, 1) == 0)
+		dma_resv_fences_add(&resv->readers, &rq->fence);
 }
 
 int i915_vma_move_to_active(struct i915_vma *vma,
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index ff3d9acc24fc..1d5ef3d6d3e2 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -136,7 +136,7 @@ static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo,
 	int err = 0;
 
 	if (!write) {
-		err = dma_resv_reserve_shared(bo->gem.resv, 1);
+		err = dma_resv_fences_reserve(&bo->gem.resv->readers, 1);
 		if (err)
 			return err;
 	}
@@ -298,7 +298,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 		if (submit->bos[i].flags & LIMA_SUBMIT_BO_WRITE)
 			dma_resv_add_excl_fence(bos[i]->gem.resv, fence);
 		else
-			dma_resv_add_shared_fence(bos[i]->gem.resv, fence);
+			dma_resv_fences_add(&bos[i]->gem.resv->readers, fence);
 	}
 
 	lima_gem_unlock_bos(bos, submit->nr_bos, &ctx);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 90e3dc3b927a..3d9a1d53c309 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -663,11 +663,10 @@ void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass)
 int msm_gem_sync_object(struct drm_gem_object *obj,
 		struct msm_fence_context *fctx, bool exclusive)
 {
-	struct dma_resv_list *fobj;
-	struct dma_fence *fence;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	int i, ret;
 
-	fobj = dma_resv_get_list(obj->resv);
 	fence = dma_resv_get_excl(obj->resv);
 	/* don't need to wait on our own fences, since ring is fifo */
 	if (fence && (fence->context != fctx->context)) {
@@ -676,12 +675,11 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
 			return ret;
 	}
 
-	if (!exclusive || !fobj)
+	if (!exclusive)
 		return 0;
 
-	for (i = 0; i < fobj->shared_count; i++) {
-		fence = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(obj->resv));
+	readers = dma_resv_fences_deref(obj->resv, &obj->res->readers);
+	dma_fence_array_for_each(fence, cursor, readers) {
 		if (fence->context != fctx->context) {
 			ret = dma_fence_wait(fence, true);
 			if (ret)
@@ -701,7 +699,7 @@ void msm_gem_move_to_active(struct drm_gem_object *obj,
 	if (exclusive)
 		dma_resv_add_excl_fence(obj->resv, fence);
 	else
-		dma_resv_add_shared_fence(obj->resv, fence);
+		dma_resv_fences_add(&obj->resv->readers, fence);
 	list_del_init(&msm_obj->mm_list);
 	list_add_tail(&msm_obj->mm_list, &gpu->active_list);
 }
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 2e1556b7af26..4d038d2004fb 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -225,8 +225,8 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
 			 * strange place to call it.  OTOH this is a
 			 * convenient can-fail point to hook it in.
 			 */
-			ret = dma_resv_reserve_shared(msm_obj->base.resv,
-								1);
+			ret = dma_resv_fences_reserve(&msm_obj->base.resv->readers,
+						      1);
 			if (ret)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e0b1bbee936f..6b3fad03b342 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1660,7 +1660,7 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence, bool excl
 	if (exclusive)
 		dma_resv_add_excl_fence(resv, &fence->base);
 	else if (fence)
-		dma_resv_add_shared_fence(resv, &fence->base);
+		dma_resv_fences_add(&resv->readers, &fence->base);
 }
 
 struct ttm_bo_driver nouveau_bo_driver = {
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 42ddddbb49e4..350cd2a9ea51 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -28,6 +28,7 @@
 
 #include <linux/ktime.h>
 #include <linux/hrtimer.h>
+#include <linux/dma-fence-array.h>
 #include <trace/events/dma_fence.h>
 
 #include <nvif/cl826e.h>
@@ -334,22 +335,20 @@ int
 nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool exclusive, bool intr)
 {
 	struct nouveau_fence_chan *fctx = chan->fence;
-	struct dma_fence *fence;
 	struct dma_resv *resv = nvbo->bo.base.resv;
-	struct dma_resv_list *fobj;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *fence, *readers;
 	struct nouveau_fence *f;
-	int ret = 0, i;
+	int ret = 0;
 
 	if (!exclusive) {
-		ret = dma_resv_reserve_shared(resv, 1);
+		ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 
 		if (ret)
 			return ret;
 	}
 
-	fobj = dma_resv_get_list(resv);
 	fence = dma_resv_get_excl(resv);
-
 	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
@@ -369,16 +368,14 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 		return ret;
 	}
 
-	if (!exclusive || !fobj)
+	if (!exclusive)
 		return ret;
 
-	for (i = 0; i < fobj->shared_count && !ret; ++i) {
+	readers = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(fence, cursor, readers) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
-		fence = rcu_dereference_protected(fobj->shared[i],
-						dma_resv_held(resv));
-
 		f = nouveau_local_fence(fence, chan->drm);
 		if (f) {
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index a4f4175bbdbe..89b91793a657 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -28,6 +28,8 @@
  *  Alon Levy <alevy@redhat.com>
  */
 
+#include <linux/dma-fence-array.h>
+
 #include <drm/drm_debugfs.h>
 #include <drm/drm_file.h>
 
@@ -57,12 +59,15 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
 	struct qxl_bo *bo;
 
 	list_for_each_entry(bo, &qdev->gem.objects, list) {
-		struct dma_resv_list *fobj;
+		struct dma_resv *resv = bo->tbo.base.resv;
+		struct dma_fence_array *array;
+		struct dma_fence *readers;
 		int rel;
 
 		rcu_read_lock();
-		fobj = rcu_dereference(bo->tbo.base.resv->fence);
-		rel = fobj ? fobj->shared_count : 0;
+		readers = dma_resv_fences_deref(resv, &resv->readers);
+		array = to_dma_fence_array(readers);
+		rel = array ? array->num_fences : readers ? 1 : 0;
 		rcu_read_unlock();
 
 		seq_printf(m, "size %ld, pc %d, num releases %d\n",
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 312216caeea2..b73cbbc60167 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -229,6 +229,7 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo)
 static int qxl_release_validate_bo(struct qxl_bo *bo)
 {
 	struct ttm_operation_ctx ctx = { true, false };
+	struct dma_resv *resv = bo->tbo.base.resv;
 	int ret;
 
 	if (!bo->pin_count) {
@@ -238,7 +239,7 @@ static int qxl_release_validate_bo(struct qxl_bo *bo)
 			return ret;
 	}
 
-	ret = dma_resv_reserve_shared(bo->tbo.base.resv, 1);
+	ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 	if (ret)
 		return ret;
 
@@ -458,7 +459,7 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 	list_for_each_entry(entry, &release->bos, head) {
 		bo = entry->bo;
 
-		dma_resv_add_shared_fence(bo->base.resv, &release->base);
+		dma_resv_fences_add(&bo->base.resv->readers, &release->base);
 		ttm_bo_add_to_lru(bo);
 		dma_resv_unlock(bo->base.resv);
 	}
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 2abe1eab471f..37abde91a79b 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -873,7 +873,7 @@ void radeon_bo_fence(struct radeon_bo *bo, struct radeon_fence *fence,
 	struct dma_resv *resv = bo->tbo.base.resv;
 
 	if (shared)
-		dma_resv_add_shared_fence(resv, &fence->base);
+		dma_resv_fences_add(&resv->readers, &fence->base);
 	else
 		dma_resv_add_excl_fence(resv, &fence->base);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_sync.c b/drivers/gpu/drm/radeon/radeon_sync.c
index 55cc77a73c7b..971dd6c01e60 100644
--- a/drivers/gpu/drm/radeon/radeon_sync.c
+++ b/drivers/gpu/drm/radeon/radeon_sync.c
@@ -27,6 +27,7 @@
  * Authors:
  *    Christian König <christian.koenig@amd.com>
  */
+#include <linux/dma-fence-array.h>
 
 #include "radeon.h"
 #include "radeon_trace.h"
@@ -90,10 +91,9 @@ int radeon_sync_resv(struct radeon_device *rdev,
 		     struct dma_resv *resv,
 		     bool shared)
 {
-	struct dma_resv_list *flist;
-	struct dma_fence *f;
+	struct dma_fence_array_cursor cursor;
+	struct dma_fence *f, *readers;
 	struct radeon_fence *fence;
-	unsigned i;
 	int r = 0;
 
 	/* always sync to the exclusive fence */
@@ -104,13 +104,11 @@ int radeon_sync_resv(struct radeon_device *rdev,
 	else if (f)
 		r = dma_fence_wait(f, true);
 
-	flist = dma_resv_get_list(resv);
-	if (shared || !flist || r)
+	if (shared || r)
 		return r;
 
-	for (i = 0; i < flist->shared_count; ++i) {
-		f = rcu_dereference_protected(flist->shared[i],
-					      dma_resv_held(resv));
+	readers = dma_resv_fences_deref(resv, &resv->readers);
+	dma_fence_array_for_each(f, cursor, readers) {
 		fence = to_radeon_fence(f);
 		if (fence && fence->rdev == rdev)
 			radeon_sync_fence(sync, fence);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index e0ad547786e8..c4c474e283bd 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -826,12 +826,13 @@ static int radeon_vm_update_ptes(struct radeon_device *rdev,
 	for (addr = start; addr < end; ) {
 		uint64_t pt_idx = addr >> radeon_vm_block_size;
 		struct radeon_bo *pt = vm->page_tables[pt_idx].bo;
+		struct dma_resv *resv = pt->tbo.base.resv;
 		unsigned nptes;
 		uint64_t pte;
 		int r;
 
 		radeon_sync_resv(rdev, &ib->sync, pt->tbo.base.resv, true);
-		r = dma_resv_reserve_shared(pt->tbo.base.resv, 1);
+		r = dma_resv_fences_reserve(resv, &resv->readers, 1);
 		if (r)
 			return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f78f52cc2e6d..19eeefbbb65a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -453,22 +453,15 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
 
 static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
 {
-	struct dma_resv_list *fobj;
 	struct dma_fence *fence;
-	int i;
 
-	fobj = dma_resv_get_list(&bo->base._resv);
 	fence = dma_resv_get_excl(&bo->base._resv);
 	if (fence && !fence->ops->signaled)
 		dma_fence_enable_sw_signaling(fence);
 
-	for (i = 0; fobj && i < fobj->shared_count; ++i) {
-		fence = rcu_dereference_protected(fobj->shared[i],
-					dma_resv_held(bo->base.resv));
-
-		if (!fence->ops->signaled)
-			dma_fence_enable_sw_signaling(fence);
-	}
+	fence = dma_resv_fences_deref(&bo->base._resv, &bo->base._resv.readers);
+	if (fence && !fence->ops->signaled)
+		dma_fence_enable_sw_signaling(fence);
 }
 
 static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
@@ -932,9 +925,11 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 	spin_unlock(&man->move_lock);
 
 	if (fence) {
-		dma_resv_add_shared_fence(bo->base.resv, fence);
+		struct dma_resv *resv = bo->base.resv;
+
+		dma_resv_fences_add(&resv->readers, fence);
 
-		ret = dma_resv_reserve_shared(bo->base.resv, 1);
+		ret = dma_resv_fences_reserve(resv, &resv->readers, 1);
 		if (unlikely(ret)) {
 			dma_fence_put(fence);
 			return ret;
@@ -1091,7 +1086,8 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 	bool type_found = false;
 	int i, ret;
 
-	ret = dma_resv_reserve_shared(bo->base.resv, 1);
+	ret = dma_resv_fences_reserve(bo->base.resv,
+				      &bo->base.resv->readers, 1);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 131dae8f4170..b9a2778df67c 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -130,8 +130,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 			if (!entry->num_shared)
 				continue;
 
-			ret = dma_resv_reserve_shared(bo->base.resv,
-								entry->num_shared);
+			ret = dma_resv_fences_reserve(bo->base.resv,
+						      &bo->base.resv->readers,
+						      entry->num_shared);
 			if (!ret)
 				continue;
 		}
@@ -153,8 +154,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 		}
 
 		if (!ret && entry->num_shared)
-			ret = dma_resv_reserve_shared(bo->base.resv,
-								entry->num_shared);
+			ret = dma_resv_fences_reserve(bo->base.resv,
+						      &bo->base.resv->readers,
+						      entry->num_shared);
 
 		if (unlikely(ret != 0)) {
 			if (ret == -EINTR)
@@ -201,7 +203,7 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 	list_for_each_entry(entry, list, head) {
 		bo = entry->bo;
 		if (entry->num_shared)
-			dma_resv_add_shared_fence(bo->base.resv, fence);
+			dma_resv_fences_add(&bo->base.resv->readers, fence);
 		else
 			dma_resv_add_excl_fence(bo->base.resv, fence);
 		if (list_empty(&bo->lru))
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 7a06cb6e31c5..253f832c29d5 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -543,7 +543,7 @@ vc4_update_bo_seqnos(struct vc4_exec_info *exec, uint64_t seqno)
 		bo = to_vc4_bo(&exec->bo[i]->base);
 		bo->seqno = seqno;
 
-		dma_resv_add_shared_fence(bo->base.base.resv, exec->fence);
+		dma_resv_fences_add(&bo->base.base.resv->readers, exec->fence);
 	}
 
 	list_for_each_entry(bo, &exec->unref_list, unref_head) {
@@ -642,7 +642,7 @@ vc4_lock_bo_reservations(struct drm_device *dev,
 	for (i = 0; i < exec->bo_count; i++) {
 		bo = &exec->bo[i]->base;
 
-		ret = dma_resv_reserve_shared(bo->resv, 1);
+		ret = dma_resv_fences_reserve(&bo->resv->readers, 1);
 		if (ret) {
 			vc4_unlock_bo_reservations(dev, exec, acquire_ctx);
 			return ret;
diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
index 9268f6fc3f66..deeef3d4deba 100644
--- a/drivers/gpu/drm/vgem/vgem_fence.c
+++ b/drivers/gpu/drm/vgem/vgem_fence.c
@@ -162,8 +162,8 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
 	dma_resv_lock(resv, NULL);
 	if (arg->flags & VGEM_FENCE_WRITE)
 		dma_resv_add_excl_fence(resv, fence);
-	else if ((ret = dma_resv_reserve_shared(resv, 1)) == 0)
-		dma_resv_add_shared_fence(resv, fence);
+	else if ((ret = dma_resv_fences_reserve(resv, &resv->readers, 1)) == 0)
+		dma_resv_fences_add(&resv->readers, fence);
 	dma_resv_unlock(resv);
 
 	/* Record the fence in our idr for later signaling */
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index c70f13fa6789..b23e16975f39 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -72,32 +72,19 @@ void dma_resv_fences_add(struct dma_resv_fences *fences,
 void dma_resv_fences_commit(struct dma_resv *obj,
 			    struct dma_resv_fences *fences);
 
-/**
- * struct dma_resv_list - a list of shared fences
- * @rcu: for internal use
- * @shared_count: table of shared fences
- * @shared_max: for growing shared fence table
- * @shared: shared fence table
- */
-struct dma_resv_list {
-	struct rcu_head rcu;
-	u32 shared_count, shared_max;
-	struct dma_fence __rcu *shared[];
-};
-
 /**
  * struct dma_resv - a reservation object manages fences for a buffer
  * @lock: update side lock
  * @seq: sequence count for managing RCU read-side synchronization
  * @fence_excl: the exclusive fence, if there is one currently
- * @fence: list of current shared fences
+ * @readers: array of read operations for implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
 	seqcount_t seq;
 
 	struct dma_fence __rcu *fence_excl;
-	struct dma_resv_list __rcu *fence;
+	struct dma_resv_fences	readers;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
@@ -129,20 +116,6 @@ dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
 	return dma_fence_get_rcu_safe(&fences->fence);
 }
 
-/**
- * dma_resv_get_list - get the reservation object's
- * shared fence list, with update-side lock held
- * @obj: the reservation object
- *
- * Returns the shared fence list.  Does NOT take references to
- * the fence.  The obj->lock must be held.
- */
-static inline struct dma_resv_list *dma_resv_get_list(struct dma_resv *obj)
-{
-	return rcu_dereference_protected(obj->fence,
-					 dma_resv_held(obj));
-}
-
 /**
  * dma_resv_lock - lock the reservation object
  * @obj: the reservation object
@@ -266,14 +239,7 @@ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj)
  */
 static inline void dma_resv_unlock(struct dma_resv *obj)
 {
-#ifdef CONFIG_DEBUG_MUTEXES
-	/* Test shared fence slot reservation */
-	if (rcu_access_pointer(obj->fence)) {
-		struct dma_resv_list *fence = dma_resv_get_list(obj);
-
-		fence->shared_max = fence->shared_count;
-	}
-#endif
+	dma_resv_fences_commit(obj, &obj->readers);
 	ww_mutex_unlock(&obj->lock);
 }
 
@@ -324,8 +290,6 @@ dma_resv_get_excl_rcu(struct dma_resv *obj)
 
 void dma_resv_init(struct dma_resv *obj);
 void dma_resv_fini(struct dma_resv *obj);
-int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
-void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
 
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
 void dma_resv_prune_fences(struct dma_resv *obj);
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/10] dma-buf/resv: replace exclusive fence with new fences container
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Replace the exclusive fence implementation with the new writers container
and change all users accordingly.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c                    |  20 ++--
 drivers/dma-buf/dma-resv.c                   | 106 +++++++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c     |   2 +-
 drivers/gpu/drm/drm_gem.c                    |   7 +-
 drivers/gpu/drm/drm_gem_framebuffer_helper.c |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c        |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_busy.c     |  16 ++-
 drivers/gpu/drm/nouveau/nouveau_bo.c         |   3 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c      |   2 +-
 drivers/gpu/drm/radeon/radeon_display.c      |   4 +-
 drivers/gpu/drm/radeon/radeon_sync.c         |   2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c          |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c                 |   2 +-
 include/linux/dma-resv.h                     |  33 +-----
 15 files changed, 78 insertions(+), 132 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 220cfa9ca82b..33fb3608e8ba 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -195,7 +195,7 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 
 static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 {
-	struct dma_fence *fence_excl, *readers;
+	struct dma_fence *writers, *readers;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv;
 	__poll_t events;
@@ -213,10 +213,10 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		return 0;
 
 	rcu_read_lock();
-	fence_excl = dma_fence_get_rcu_safe(&resv->fence_excl);
+	writers = dma_resv_fences_get_rcu(&resv->writers);
 	readers = dma_resv_fences_get_rcu(&resv->readers);
 
-	if (fence_excl && (!(events & EPOLLOUT) || !readers)) {
+	if (writers && (!(events & EPOLLOUT) || !readers)) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
 		__poll_t pevents = EPOLLIN;
 
@@ -232,16 +232,16 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		spin_unlock_irq(&dmabuf->poll.lock);
 
 		if (events & pevents) {
-			if (!dma_fence_add_callback(fence_excl, &dcb->cb,
+			if (!dma_fence_add_callback(writers, &dcb->cb,
 						    dma_buf_poll_cb)) {
 				events &= ~pevents;
-				dma_fence_put(fence_excl);
+				dma_fence_put(writers);
 			} else {
 				/*
 				 * No callback queued, wake up any additional
 				 * waiters.
 				 */
-				dma_fence_put(fence_excl);
+				dma_fence_put(writers);
 				dma_buf_poll_cb(NULL, &dcb->cb);
 			}
 		}
@@ -1122,7 +1122,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 	struct dma_buf_attachment *attach_obj;
 	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj;
-	struct dma_fence *fence, *readers;
+	struct dma_fence *fence, *writers, *readers;
 	int count = 0, attach_count;
 	size_t size = 0;
 
@@ -1154,10 +1154,10 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 
 		robj = buf_obj->resv;
 		rcu_read_lock();
-		fence = dma_resv_get_excl(robj);
+		writers = dma_resv_fences_get_rcu(&robj->writers);
 		readers = dma_resv_fences_get_rcu(&robj->readers);
-		if (fence)
-			seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
+		dma_fence_array_for_each(fence, cursor, writers)
+			seq_printf(s, "\tWriters fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 8723af0a7a4d..8ef7dbc7fd8e 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -39,22 +39,16 @@
 /**
  * DOC: Reservation Object Overview
  *
- * The reservation object provides a mechanism to manage readers and
- * exclusive fences associated with a buffer.  A reservation object
- * can have attached one exclusive fence (normally associated with
- * write operations) or N read fences.  The RCU mechanism is used to
- * protect read access to fences from locked write-side updates.
+ * The reservation object provides a mechanism to manage readers and writers
+ * associated with a buffer. A reservation object can have attached any number
+ * of read and writer operation in the form of dma_fence objects. The RCU
+ * mechanism is used to protect read access to fences from locked write-side
+ * updates.
  */
 
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 static void dma_resv_fences_init(struct dma_resv_fences *fences)
 {
 	RCU_INIT_POINTER(fences->fence, NULL);
@@ -284,9 +278,7 @@ void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
-	RCU_INIT_POINTER(obj->fence_excl, NULL);
+	dma_resv_fences_init(&obj->writers);
 	dma_resv_fences_init(&obj->readers);
 }
 EXPORT_SYMBOL(dma_resv_init);
@@ -301,7 +293,7 @@ void dma_resv_fini(struct dma_resv *obj)
 	 * This object should be dead and all references must have
 	 * been released to it, so no need to be protected with rcu.
 	 */
-	dma_fence_put(rcu_dereference_protected(obj->fence_excl, true));
+	dma_resv_fences_fini(&obj->writers);
 	dma_resv_fences_fini(&obj->readers);
 	ww_mutex_destroy(&obj->lock);
 }
@@ -312,25 +304,14 @@ EXPORT_SYMBOL(dma_resv_fini);
  * @obj: the reservation object
  * @fence: the shared fence to add
  *
- * Add a fence to the exclusive slot.  The obj->lock must be held.
+ * Add a fence to the exclusive slots.  The obj->lock must be held.
  */
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 {
-	struct dma_fence *old_fence = dma_resv_get_excl(obj);
-
 	dma_resv_assert_held(obj);
 
-	dma_fence_get(fence);
-
-	preempt_disable();
-	write_seqcount_begin(&obj->seq);
-	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(obj->fence_excl, fence);
+	dma_resv_fences_set(obj, &obj->writers, fence);
 	dma_resv_fences_set(obj, &obj->readers, fence);
-	write_seqcount_end(&obj->seq);
-	preempt_enable();
-
-	dma_fence_put(old_fence);
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
 
@@ -346,11 +327,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 
 	dma_resv_assert_held(obj);
 
-	fence = dma_resv_get_excl(obj);
-	if (dma_fence_is_signaled(fence)) {
-		RCU_INIT_POINTER(obj->fence_excl, NULL);
-		dma_fence_put(fence);
-	}
+	fence = dma_resv_fences_deref(obj, &obj->writers);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->writers, NULL);
 
 	fence = dma_resv_fences_deref(obj, &obj->readers);
 	if (dma_fence_is_signaled(fence))
@@ -367,36 +346,27 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_fence *old, *excl, *readers;
+	struct dma_fence *writers, *readers;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&src->fence_excl);
+	writers = dma_resv_fences_get_rcu(&src->writers);
 	readers = dma_resv_fences_get_rcu(&src->readers);
 	rcu_read_unlock();
 
-	old = dma_resv_get_excl(dst);
-
-	preempt_disable();
-	write_seqcount_begin(&dst->seq);
-	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(dst->fence_excl, excl);
+	dma_resv_fences_set(dst, &dst->writers, writers);
 	dma_resv_fences_set(dst, &dst->readers, readers);
-	write_seqcount_end(&dst->seq);
-	preempt_enable();
-
-	dma_fence_put(old);
 
 	return 0;
 }
 EXPORT_SYMBOL(dma_resv_copy_fences);
 
 /**
- * dma_resv_get_fences_rcu - Get an object's readers and exclusive
+ * dma_resv_get_fences_rcu - Get an object's readers and writers
  * fences without update side lock held
  * @obj: the reservation object
- * @pfence_excl: the returned exclusive fence (or NULL)
+ * @pfence_excl: the returned writers (or NULL)
  * @pshared_count: the number of shared fences returned
  * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
  * the required size, and must be freed by caller)
@@ -404,30 +374,32 @@ EXPORT_SYMBOL(dma_resv_copy_fences);
  * Retrieve all fences from the reservation object. If the pointer for the
  * exclusive fence is not specified the fence is put into the array of the
  * shared fences as well. Returns either zero or -ENOMEM.
+ *
+ * TODO: Deprecated and shouldn't be used any more.
  */
 int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    struct dma_fence **pfence_excl,
 			    unsigned *pshared_count,
 			    struct dma_fence ***pshared)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	struct dma_fence **shared;
 	unsigned int shared_count;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
 	shared_count = readers ? 1 : 0;
-	if (excl && !pfence_excl)
+	if (writers && !pfence_excl)
 		++shared_count;
 
 	if (shared_count) {
 		shared = kmalloc_array(shared_count, sizeof(*shared),
 				       GFP_KERNEL);
 		if (!shared) {
-			dma_fence_put(excl);
+			dma_fence_put(writers);
 			dma_fence_put(readers);
 			return -ENOMEM;
 		}
@@ -435,8 +407,8 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 		shared_count = 0;
 		if (readers)
 			shared[shared_count++] = readers;
-		if (excl && !pfence_excl)
-			shared[shared_count++] = excl;
+		if (writers && !pfence_excl)
+			shared[shared_count++] = writers;
 
 		*pshared = shared;
 		*pshared_count = shared_count;
@@ -446,7 +418,7 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 	}
 
 	if (pfence_excl)
-		*pfence_excl = excl;
+		*pfence_excl = writers;
 
 	return 0;
 }
@@ -454,9 +426,9 @@ EXPORT_SYMBOL_GPL(dma_resv_get_fences_rcu);
 
 /**
  * dma_resv_wait_timeout_rcu - Wait on reservation's objects
- * readers and/or exclusive fences.
+ * readers and/or writers.
  * @obj: the reservation object
- * @wait_all: if true, wait on all fences, else wait on just exclusive fence
+ * @wait_all: if true, wait on all fences, else wait on just writers
  * @intr: if true, do interruptible wait
  * @timeout: timeout value in jiffies or zero to return immediately
  *
@@ -468,11 +440,11 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	long ret = timeout ? timeout : 1;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
@@ -482,11 +454,11 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			goto out;
 	}
 
-	if (excl)
-		ret = dma_fence_wait_timeout(excl, intr, ret);
+	if (writers)
+		ret = dma_fence_wait_timeout(writers, intr, ret);
 
 out:
-	dma_fence_put(excl);
+	dma_fence_put(writers);
 	dma_fence_put(readers);
 	return ret;
 }
@@ -496,7 +468,7 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  * dma_resv_test_signaled_rcu - Test if a reservation object's
  * fences have been signaled.
  * @obj: the reservation object
- * @test_all: if true, test all fences, otherwise only test the exclusive
+ * @test_all: if true, test all fences, otherwise only test the writers
  * fence
  *
  * RETURNS
@@ -504,21 +476,21 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	bool ret = true;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
-	if (excl)
-		ret = dma_fence_is_signaled(excl);
+	if (writers)
+		ret = dma_fence_is_signaled(writers);
 
 	if (test_all && readers)
 		ret &= dma_fence_is_signaled(readers);
 
-	dma_fence_put(excl);
+	dma_fence_put(writers);
 	dma_fence_put(readers);
 
 	return ret;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 3e4685565f82..fde3aa1ad76d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -141,7 +141,7 @@ __dma_resv_make_exclusive(struct dma_resv *obj)
 {
 	struct dma_fence *readers = dma_fence_get_rcu(obj->readers.fence);
 
-	dma_resv_add_excl_fence(obj, readers);
+	dma_resv_fences_set(obj, &obj->writers, readers);
 	dma_fence_put(readers);
 
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index f8bc5baf0a44..7ba84897d52d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -204,7 +204,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 		return -EINVAL;
 
 	/* always sync to the exclusive fence */
-	f = dma_resv_get_excl(resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	r = amdgpu_sync_fence(adev, sync, f, false);
 	if (r)
 		return r;
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6854f5867d51..213410f8af6e 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1412,14 +1412,13 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
 	unsigned int i, fence_count;
 
 	if (!write) {
-		struct dma_fence *fence =
-			dma_resv_get_excl_rcu(obj->resv);
+		struct dma_fence *fence;
 
+		fence = dma_resv_fences_deref(obj->resv, &obj->resv->writers);
 		return drm_gem_fence_array_add(fence_array, fence);
 	}
 
-	ret = dma_resv_get_fences_rcu(obj->resv, NULL,
-						&fence_count, &fences);
+	ret = dma_resv_get_fences_rcu(obj->resv, NULL, &fence_count, &fences);
 	if (ret || !fence_count)
 		return ret;
 
diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
index b9bcd310ca2d..72af3547d2e3 100644
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -294,7 +294,7 @@ int drm_gem_fb_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	obj = drm_gem_fb_get_obj(state->fb, 0);
-	fence = dma_resv_get_excl_rcu(obj->resv);
+	fence = dma_resv_fences_get_rcu(&obj->resv->writers);
 	drm_atomic_set_fence_for_plane(state, fence);
 
 	return 0;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 15f36ccd6e53..df6452f90576 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -460,9 +460,9 @@ static void etnaviv_gem_describe_fence(struct dma_fence *fence,
 static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 {
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+	struct dma_fence *fence, *writers, *readers;
 	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj = obj->resv;
-	struct dma_fence *fence, *readers;
 	unsigned long off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08lx %p %zd\n",
@@ -474,8 +474,8 @@ static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 	readers = dma_resv_fences_deref(robj, &robj->readers);
 	dma_fence_array_for_each(fence, cursor, readers)
 		etnaviv_gem_describe_fence(fence, "Shared", m);
-	fence = rcu_dereference(robj->fence_excl);
-	if (fence)
+	writers = dma_resv_fences_deref(robj, &robj->writers);
+	dma_fence_array_for_each(fence, cursor, readers)
 		etnaviv_gem_describe_fence(fence, "Exclusive", m);
 	rcu_read_unlock();
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
index 0f18ba97b228..36fe5881667a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
@@ -84,9 +84,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_busy *args = data;
 	struct dma_fence_array_cursor cursor;
-	struct dma_fence *fence, *readers;
+	struct dma_fence *fence, *writers, *readers;
 	struct drm_i915_gem_object *obj;
-	unsigned int seq;
 	int err;
 
 	err = -ENOENT;
@@ -112,21 +111,18 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * to report the overall busyness. This is what the wait-ioctl does.
 	 *
 	 */
-retry:
-	seq = raw_read_seqcount(&obj->base.resv->seq);
+	writers = dma_resv_fences_get_rcu(&obj->base.resv->writers);
+	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
 
 	/* Translate the exclusive fence to the READ *and* WRITE engine */
-	args->busy =
-		busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
+	args->busy = busy_check_writer(writers);
 
 	/* Translate shared fences to READ set of engines */
-	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
 	dma_fence_array_for_each(fence, cursor, readers)
 		args->busy |= busy_check_reader(fence);
-	dma_fence_put(readers);
 
-	if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq))
-		goto retry;
+	dma_fence_put(writers);
+	dma_fence_put(readers);
 
 	err = 0;
 out:
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 6b3fad03b342..f865eadae965 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1324,8 +1324,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
 {
 	struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
 	struct drm_device *dev = drm->dev;
-	struct dma_fence *fence = dma_resv_get_excl(bo->base.resv);
+	struct dma_fence *fence;
 
+	fence = dma_resv_fences_deref(bo->base.resv, &bo->base.resv->writers);
 	nv10_bo_put_tile_region(dev, *old_tile, fence);
 	*old_tile = new_tile;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 350cd2a9ea51..a97878fbbc96 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -348,7 +348,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 			return ret;
 	}
 
-	fence = dma_resv_get_excl(resv);
+	fence = dma_resv_fences_deref(resv, &resv->writers);
 	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index e81b01f8db90..6ef589081c6b 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -533,7 +533,9 @@ static int radeon_crtc_page_flip_target(struct drm_crtc *crtc,
 		DRM_ERROR("failed to pin new rbo buffer before flip\n");
 		goto cleanup;
 	}
-	work->fence = dma_fence_get(dma_resv_get_excl(new_rbo->tbo.base.resv));
+	work->fence = dma_resv_fences_deref(new_rbo->tbo.base.resv,
+					    &new_rbo->tbo.base.resv->writers);
+	dma_fence_get(work->fence);
 	radeon_bo_get_tiling_flags(new_rbo, &tiling_flags, NULL);
 	radeon_bo_unreserve(new_rbo);
 
diff --git a/drivers/gpu/drm/radeon/radeon_sync.c b/drivers/gpu/drm/radeon/radeon_sync.c
index 971dd6c01e60..bf595428dabf 100644
--- a/drivers/gpu/drm/radeon/radeon_sync.c
+++ b/drivers/gpu/drm/radeon/radeon_sync.c
@@ -97,7 +97,7 @@ int radeon_sync_resv(struct radeon_device *rdev,
 	int r = 0;
 
 	/* always sync to the exclusive fence */
-	f = dma_resv_get_excl(resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	fence = f ? to_radeon_fence(f) : NULL;
 	if (fence && fence->rdev == rdev)
 		radeon_sync_fence(sync, fence);
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 1ad5c3b86b64..72bfcddbce2b 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -465,6 +465,7 @@ static int radeon_uvd_validate_codec(struct radeon_cs_parser *p,
 static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 			     unsigned offset, unsigned buf_sizes[])
 {
+	struct dma_resv *resv = bo->tbo.base.resv;
 	int32_t *msg, msg_type, handle;
 	unsigned img_size = 0;
 	struct dma_fence *f;
@@ -477,7 +478,7 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 		return -EINVAL;
 	}
 
-	f = dma_resv_get_excl(bo->tbo.base.resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	if (f) {
 		r = radeon_fence_wait((struct radeon_fence *)f, false);
 		if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 19eeefbbb65a..53d4035778d0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -455,7 +455,7 @@ static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
 {
 	struct dma_fence *fence;
 
-	fence = dma_resv_get_excl(&bo->base._resv);
+	fence = dma_resv_fences_deref(&bo->base._resv, &bo->base._resv.writers);
 	if (fence && !fence->ops->signaled)
 		dma_fence_enable_sw_signaling(fence);
 
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index b23e16975f39..72c3c4f99711 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -48,8 +48,6 @@
 struct dma_resv;
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_fences - fences inside a reservation object
@@ -76,15 +74,14 @@ void dma_resv_fences_commit(struct dma_resv *obj,
  * struct dma_resv - a reservation object manages fences for a buffer
  * @lock: update side lock
  * @seq: sequence count for managing RCU read-side synchronization
- * @fence_excl: the exclusive fence, if there is one currently
+ * @writers: array of write operations for implicit sync
  * @readers: array of read operations for implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
 
-	struct dma_fence __rcu *fence_excl;
-	struct dma_resv_fences	readers;
+	struct dma_resv_fences writers;
+	struct dma_resv_fences readers;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
@@ -243,25 +240,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj)
 	ww_mutex_unlock(&obj->lock);
 }
 
-/**
- * dma_resv_get_excl - get the reservation object's
- * exclusive fence, with update-side lock held
- * @obj: the reservation object
- *
- * Returns the exclusive fence (if any).  Does NOT take a
- * reference. Writers must hold obj->lock, readers may only
- * hold a RCU read side lock.
- *
- * RETURNS
- * The exclusive fence or NULL
- */
-static inline struct dma_fence *
-dma_resv_get_excl(struct dma_resv *obj)
-{
-	return rcu_dereference_protected(obj->fence_excl,
-					 dma_resv_held(obj));
-}
-
 /**
  * dma_resv_get_excl_rcu - get the reservation object's
  * exclusive fence, without lock held.
@@ -278,11 +256,8 @@ dma_resv_get_excl_rcu(struct dma_resv *obj)
 {
 	struct dma_fence *fence;
 
-	if (!rcu_access_pointer(obj->fence_excl))
-		return NULL;
-
 	rcu_read_lock();
-	fence = dma_fence_get_rcu_safe(&obj->fence_excl);
+	fence = dma_resv_fences_get_rcu(&obj->writers);
 	rcu_read_unlock();
 
 	return fence;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/10] dma-buf/resv: replace exclusive fence with new fences container
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Replace the exclusive fence implementation with the new writers container
and change all users accordingly.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c                    |  20 ++--
 drivers/dma-buf/dma-resv.c                   | 106 +++++++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c     |   2 +-
 drivers/gpu/drm/drm_gem.c                    |   7 +-
 drivers/gpu/drm/drm_gem_framebuffer_helper.c |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c        |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_busy.c     |  16 ++-
 drivers/gpu/drm/nouveau/nouveau_bo.c         |   3 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c      |   2 +-
 drivers/gpu/drm/radeon/radeon_display.c      |   4 +-
 drivers/gpu/drm/radeon/radeon_sync.c         |   2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c          |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c                 |   2 +-
 include/linux/dma-resv.h                     |  33 +-----
 15 files changed, 78 insertions(+), 132 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 220cfa9ca82b..33fb3608e8ba 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -195,7 +195,7 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 
 static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 {
-	struct dma_fence *fence_excl, *readers;
+	struct dma_fence *writers, *readers;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv;
 	__poll_t events;
@@ -213,10 +213,10 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		return 0;
 
 	rcu_read_lock();
-	fence_excl = dma_fence_get_rcu_safe(&resv->fence_excl);
+	writers = dma_resv_fences_get_rcu(&resv->writers);
 	readers = dma_resv_fences_get_rcu(&resv->readers);
 
-	if (fence_excl && (!(events & EPOLLOUT) || !readers)) {
+	if (writers && (!(events & EPOLLOUT) || !readers)) {
 		struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
 		__poll_t pevents = EPOLLIN;
 
@@ -232,16 +232,16 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
 		spin_unlock_irq(&dmabuf->poll.lock);
 
 		if (events & pevents) {
-			if (!dma_fence_add_callback(fence_excl, &dcb->cb,
+			if (!dma_fence_add_callback(writers, &dcb->cb,
 						    dma_buf_poll_cb)) {
 				events &= ~pevents;
-				dma_fence_put(fence_excl);
+				dma_fence_put(writers);
 			} else {
 				/*
 				 * No callback queued, wake up any additional
 				 * waiters.
 				 */
-				dma_fence_put(fence_excl);
+				dma_fence_put(writers);
 				dma_buf_poll_cb(NULL, &dcb->cb);
 			}
 		}
@@ -1122,7 +1122,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 	struct dma_buf_attachment *attach_obj;
 	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj;
-	struct dma_fence *fence, *readers;
+	struct dma_fence *fence, *writers, *readers;
 	int count = 0, attach_count;
 	size_t size = 0;
 
@@ -1154,10 +1154,10 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
 
 		robj = buf_obj->resv;
 		rcu_read_lock();
-		fence = dma_resv_get_excl(robj);
+		writers = dma_resv_fences_get_rcu(&robj->writers);
 		readers = dma_resv_fences_get_rcu(&robj->readers);
-		if (fence)
-			seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
+		dma_fence_array_for_each(fence, cursor, writers)
+			seq_printf(s, "\tWriters fence: %s %s %ssignalled\n",
 				   fence->ops->get_driver_name(fence),
 				   fence->ops->get_timeline_name(fence),
 				   dma_fence_is_signaled(fence) ? "" : "un");
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 8723af0a7a4d..8ef7dbc7fd8e 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -39,22 +39,16 @@
 /**
  * DOC: Reservation Object Overview
  *
- * The reservation object provides a mechanism to manage readers and
- * exclusive fences associated with a buffer.  A reservation object
- * can have attached one exclusive fence (normally associated with
- * write operations) or N read fences.  The RCU mechanism is used to
- * protect read access to fences from locked write-side updates.
+ * The reservation object provides a mechanism to manage readers and writers
+ * associated with a buffer. A reservation object can have attached any number
+ * of read and writer operation in the form of dma_fence objects. The RCU
+ * mechanism is used to protect read access to fences from locked write-side
+ * updates.
  */
 
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 static void dma_resv_fences_init(struct dma_resv_fences *fences)
 {
 	RCU_INIT_POINTER(fences->fence, NULL);
@@ -284,9 +278,7 @@ void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
-	RCU_INIT_POINTER(obj->fence_excl, NULL);
+	dma_resv_fences_init(&obj->writers);
 	dma_resv_fences_init(&obj->readers);
 }
 EXPORT_SYMBOL(dma_resv_init);
@@ -301,7 +293,7 @@ void dma_resv_fini(struct dma_resv *obj)
 	 * This object should be dead and all references must have
 	 * been released to it, so no need to be protected with rcu.
 	 */
-	dma_fence_put(rcu_dereference_protected(obj->fence_excl, true));
+	dma_resv_fences_fini(&obj->writers);
 	dma_resv_fences_fini(&obj->readers);
 	ww_mutex_destroy(&obj->lock);
 }
@@ -312,25 +304,14 @@ EXPORT_SYMBOL(dma_resv_fini);
  * @obj: the reservation object
  * @fence: the shared fence to add
  *
- * Add a fence to the exclusive slot.  The obj->lock must be held.
+ * Add a fence to the exclusive slots.  The obj->lock must be held.
  */
 void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 {
-	struct dma_fence *old_fence = dma_resv_get_excl(obj);
-
 	dma_resv_assert_held(obj);
 
-	dma_fence_get(fence);
-
-	preempt_disable();
-	write_seqcount_begin(&obj->seq);
-	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(obj->fence_excl, fence);
+	dma_resv_fences_set(obj, &obj->writers, fence);
 	dma_resv_fences_set(obj, &obj->readers, fence);
-	write_seqcount_end(&obj->seq);
-	preempt_enable();
-
-	dma_fence_put(old_fence);
 }
 EXPORT_SYMBOL(dma_resv_add_excl_fence);
 
@@ -346,11 +327,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 
 	dma_resv_assert_held(obj);
 
-	fence = dma_resv_get_excl(obj);
-	if (dma_fence_is_signaled(fence)) {
-		RCU_INIT_POINTER(obj->fence_excl, NULL);
-		dma_fence_put(fence);
-	}
+	fence = dma_resv_fences_deref(obj, &obj->writers);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->writers, NULL);
 
 	fence = dma_resv_fences_deref(obj, &obj->readers);
 	if (dma_fence_is_signaled(fence))
@@ -367,36 +346,27 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_fence *old, *excl, *readers;
+	struct dma_fence *writers, *readers;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&src->fence_excl);
+	writers = dma_resv_fences_get_rcu(&src->writers);
 	readers = dma_resv_fences_get_rcu(&src->readers);
 	rcu_read_unlock();
 
-	old = dma_resv_get_excl(dst);
-
-	preempt_disable();
-	write_seqcount_begin(&dst->seq);
-	/* write_seqcount_begin provides the necessary memory barrier */
-	RCU_INIT_POINTER(dst->fence_excl, excl);
+	dma_resv_fences_set(dst, &dst->writers, writers);
 	dma_resv_fences_set(dst, &dst->readers, readers);
-	write_seqcount_end(&dst->seq);
-	preempt_enable();
-
-	dma_fence_put(old);
 
 	return 0;
 }
 EXPORT_SYMBOL(dma_resv_copy_fences);
 
 /**
- * dma_resv_get_fences_rcu - Get an object's readers and exclusive
+ * dma_resv_get_fences_rcu - Get an object's readers and writers
  * fences without update side lock held
  * @obj: the reservation object
- * @pfence_excl: the returned exclusive fence (or NULL)
+ * @pfence_excl: the returned writers (or NULL)
  * @pshared_count: the number of shared fences returned
  * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
  * the required size, and must be freed by caller)
@@ -404,30 +374,32 @@ EXPORT_SYMBOL(dma_resv_copy_fences);
  * Retrieve all fences from the reservation object. If the pointer for the
  * exclusive fence is not specified the fence is put into the array of the
  * shared fences as well. Returns either zero or -ENOMEM.
+ *
+ * TODO: Deprecated and shouldn't be used any more.
  */
 int dma_resv_get_fences_rcu(struct dma_resv *obj,
 			    struct dma_fence **pfence_excl,
 			    unsigned *pshared_count,
 			    struct dma_fence ***pshared)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	struct dma_fence **shared;
 	unsigned int shared_count;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
 	shared_count = readers ? 1 : 0;
-	if (excl && !pfence_excl)
+	if (writers && !pfence_excl)
 		++shared_count;
 
 	if (shared_count) {
 		shared = kmalloc_array(shared_count, sizeof(*shared),
 				       GFP_KERNEL);
 		if (!shared) {
-			dma_fence_put(excl);
+			dma_fence_put(writers);
 			dma_fence_put(readers);
 			return -ENOMEM;
 		}
@@ -435,8 +407,8 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 		shared_count = 0;
 		if (readers)
 			shared[shared_count++] = readers;
-		if (excl && !pfence_excl)
-			shared[shared_count++] = excl;
+		if (writers && !pfence_excl)
+			shared[shared_count++] = writers;
 
 		*pshared = shared;
 		*pshared_count = shared_count;
@@ -446,7 +418,7 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 	}
 
 	if (pfence_excl)
-		*pfence_excl = excl;
+		*pfence_excl = writers;
 
 	return 0;
 }
@@ -454,9 +426,9 @@ EXPORT_SYMBOL_GPL(dma_resv_get_fences_rcu);
 
 /**
  * dma_resv_wait_timeout_rcu - Wait on reservation's objects
- * readers and/or exclusive fences.
+ * readers and/or writers.
  * @obj: the reservation object
- * @wait_all: if true, wait on all fences, else wait on just exclusive fence
+ * @wait_all: if true, wait on all fences, else wait on just writers
  * @intr: if true, do interruptible wait
  * @timeout: timeout value in jiffies or zero to return immediately
  *
@@ -468,11 +440,11 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	long ret = timeout ? timeout : 1;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
@@ -482,11 +454,11 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			goto out;
 	}
 
-	if (excl)
-		ret = dma_fence_wait_timeout(excl, intr, ret);
+	if (writers)
+		ret = dma_fence_wait_timeout(writers, intr, ret);
 
 out:
-	dma_fence_put(excl);
+	dma_fence_put(writers);
 	dma_fence_put(readers);
 	return ret;
 }
@@ -496,7 +468,7 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  * dma_resv_test_signaled_rcu - Test if a reservation object's
  * fences have been signaled.
  * @obj: the reservation object
- * @test_all: if true, test all fences, otherwise only test the exclusive
+ * @test_all: if true, test all fences, otherwise only test the writers
  * fence
  *
  * RETURNS
@@ -504,21 +476,21 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	struct dma_fence *excl, *readers;
+	struct dma_fence *writers, *readers;
 	bool ret = true;
 
 	rcu_read_lock();
-	excl = dma_fence_get_rcu_safe(&obj->fence_excl);
+	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
 	rcu_read_unlock();
 
-	if (excl)
-		ret = dma_fence_is_signaled(excl);
+	if (writers)
+		ret = dma_fence_is_signaled(writers);
 
 	if (test_all && readers)
 		ret &= dma_fence_is_signaled(readers);
 
-	dma_fence_put(excl);
+	dma_fence_put(writers);
 	dma_fence_put(readers);
 
 	return ret;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 3e4685565f82..fde3aa1ad76d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -141,7 +141,7 @@ __dma_resv_make_exclusive(struct dma_resv *obj)
 {
 	struct dma_fence *readers = dma_fence_get_rcu(obj->readers.fence);
 
-	dma_resv_add_excl_fence(obj, readers);
+	dma_resv_fences_set(obj, &obj->writers, readers);
 	dma_fence_put(readers);
 
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index f8bc5baf0a44..7ba84897d52d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -204,7 +204,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 		return -EINVAL;
 
 	/* always sync to the exclusive fence */
-	f = dma_resv_get_excl(resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	r = amdgpu_sync_fence(adev, sync, f, false);
 	if (r)
 		return r;
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6854f5867d51..213410f8af6e 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1412,14 +1412,13 @@ int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
 	unsigned int i, fence_count;
 
 	if (!write) {
-		struct dma_fence *fence =
-			dma_resv_get_excl_rcu(obj->resv);
+		struct dma_fence *fence;
 
+		fence = dma_resv_fences_deref(obj->resv, &obj->resv->writers);
 		return drm_gem_fence_array_add(fence_array, fence);
 	}
 
-	ret = dma_resv_get_fences_rcu(obj->resv, NULL,
-						&fence_count, &fences);
+	ret = dma_resv_get_fences_rcu(obj->resv, NULL, &fence_count, &fences);
 	if (ret || !fence_count)
 		return ret;
 
diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
index b9bcd310ca2d..72af3547d2e3 100644
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -294,7 +294,7 @@ int drm_gem_fb_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	obj = drm_gem_fb_get_obj(state->fb, 0);
-	fence = dma_resv_get_excl_rcu(obj->resv);
+	fence = dma_resv_fences_get_rcu(&obj->resv->writers);
 	drm_atomic_set_fence_for_plane(state, fence);
 
 	return 0;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 15f36ccd6e53..df6452f90576 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -460,9 +460,9 @@ static void etnaviv_gem_describe_fence(struct dma_fence *fence,
 static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 {
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+	struct dma_fence *fence, *writers, *readers;
 	struct dma_fence_array_cursor cursor;
 	struct dma_resv *robj = obj->resv;
-	struct dma_fence *fence, *readers;
 	unsigned long off = drm_vma_node_start(&obj->vma_node);
 
 	seq_printf(m, "%08x: %c %2d (%2d) %08lx %p %zd\n",
@@ -474,8 +474,8 @@ static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 	readers = dma_resv_fences_deref(robj, &robj->readers);
 	dma_fence_array_for_each(fence, cursor, readers)
 		etnaviv_gem_describe_fence(fence, "Shared", m);
-	fence = rcu_dereference(robj->fence_excl);
-	if (fence)
+	writers = dma_resv_fences_deref(robj, &robj->writers);
+	dma_fence_array_for_each(fence, cursor, readers)
 		etnaviv_gem_describe_fence(fence, "Exclusive", m);
 	rcu_read_unlock();
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
index 0f18ba97b228..36fe5881667a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
@@ -84,9 +84,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_busy *args = data;
 	struct dma_fence_array_cursor cursor;
-	struct dma_fence *fence, *readers;
+	struct dma_fence *fence, *writers, *readers;
 	struct drm_i915_gem_object *obj;
-	unsigned int seq;
 	int err;
 
 	err = -ENOENT;
@@ -112,21 +111,18 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * to report the overall busyness. This is what the wait-ioctl does.
 	 *
 	 */
-retry:
-	seq = raw_read_seqcount(&obj->base.resv->seq);
+	writers = dma_resv_fences_get_rcu(&obj->base.resv->writers);
+	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
 
 	/* Translate the exclusive fence to the READ *and* WRITE engine */
-	args->busy =
-		busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
+	args->busy = busy_check_writer(writers);
 
 	/* Translate shared fences to READ set of engines */
-	readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
 	dma_fence_array_for_each(fence, cursor, readers)
 		args->busy |= busy_check_reader(fence);
-	dma_fence_put(readers);
 
-	if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq))
-		goto retry;
+	dma_fence_put(writers);
+	dma_fence_put(readers);
 
 	err = 0;
 out:
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 6b3fad03b342..f865eadae965 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1324,8 +1324,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
 {
 	struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
 	struct drm_device *dev = drm->dev;
-	struct dma_fence *fence = dma_resv_get_excl(bo->base.resv);
+	struct dma_fence *fence;
 
+	fence = dma_resv_fences_deref(bo->base.resv, &bo->base.resv->writers);
 	nv10_bo_put_tile_region(dev, *old_tile, fence);
 	*old_tile = new_tile;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 350cd2a9ea51..a97878fbbc96 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -348,7 +348,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 			return ret;
 	}
 
-	fence = dma_resv_get_excl(resv);
+	fence = dma_resv_fences_deref(resv, &resv->writers);
 	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index e81b01f8db90..6ef589081c6b 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -533,7 +533,9 @@ static int radeon_crtc_page_flip_target(struct drm_crtc *crtc,
 		DRM_ERROR("failed to pin new rbo buffer before flip\n");
 		goto cleanup;
 	}
-	work->fence = dma_fence_get(dma_resv_get_excl(new_rbo->tbo.base.resv));
+	work->fence = dma_resv_fences_deref(new_rbo->tbo.base.resv,
+					    &new_rbo->tbo.base.resv->writers);
+	dma_fence_get(work->fence);
 	radeon_bo_get_tiling_flags(new_rbo, &tiling_flags, NULL);
 	radeon_bo_unreserve(new_rbo);
 
diff --git a/drivers/gpu/drm/radeon/radeon_sync.c b/drivers/gpu/drm/radeon/radeon_sync.c
index 971dd6c01e60..bf595428dabf 100644
--- a/drivers/gpu/drm/radeon/radeon_sync.c
+++ b/drivers/gpu/drm/radeon/radeon_sync.c
@@ -97,7 +97,7 @@ int radeon_sync_resv(struct radeon_device *rdev,
 	int r = 0;
 
 	/* always sync to the exclusive fence */
-	f = dma_resv_get_excl(resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	fence = f ? to_radeon_fence(f) : NULL;
 	if (fence && fence->rdev == rdev)
 		radeon_sync_fence(sync, fence);
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 1ad5c3b86b64..72bfcddbce2b 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -465,6 +465,7 @@ static int radeon_uvd_validate_codec(struct radeon_cs_parser *p,
 static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 			     unsigned offset, unsigned buf_sizes[])
 {
+	struct dma_resv *resv = bo->tbo.base.resv;
 	int32_t *msg, msg_type, handle;
 	unsigned img_size = 0;
 	struct dma_fence *f;
@@ -477,7 +478,7 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 		return -EINVAL;
 	}
 
-	f = dma_resv_get_excl(bo->tbo.base.resv);
+	f = dma_resv_fences_deref(resv, &resv->writers);
 	if (f) {
 		r = radeon_fence_wait((struct radeon_fence *)f, false);
 		if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 19eeefbbb65a..53d4035778d0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -455,7 +455,7 @@ static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
 {
 	struct dma_fence *fence;
 
-	fence = dma_resv_get_excl(&bo->base._resv);
+	fence = dma_resv_fences_deref(&bo->base._resv, &bo->base._resv.writers);
 	if (fence && !fence->ops->signaled)
 		dma_fence_enable_sw_signaling(fence);
 
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index b23e16975f39..72c3c4f99711 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -48,8 +48,6 @@
 struct dma_resv;
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_fences - fences inside a reservation object
@@ -76,15 +74,14 @@ void dma_resv_fences_commit(struct dma_resv *obj,
  * struct dma_resv - a reservation object manages fences for a buffer
  * @lock: update side lock
  * @seq: sequence count for managing RCU read-side synchronization
- * @fence_excl: the exclusive fence, if there is one currently
+ * @writers: array of write operations for implicit sync
  * @readers: array of read operations for implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
 
-	struct dma_fence __rcu *fence_excl;
-	struct dma_resv_fences	readers;
+	struct dma_resv_fences writers;
+	struct dma_resv_fences readers;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
@@ -243,25 +240,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj)
 	ww_mutex_unlock(&obj->lock);
 }
 
-/**
- * dma_resv_get_excl - get the reservation object's
- * exclusive fence, with update-side lock held
- * @obj: the reservation object
- *
- * Returns the exclusive fence (if any).  Does NOT take a
- * reference. Writers must hold obj->lock, readers may only
- * hold a RCU read side lock.
- *
- * RETURNS
- * The exclusive fence or NULL
- */
-static inline struct dma_fence *
-dma_resv_get_excl(struct dma_resv *obj)
-{
-	return rcu_dereference_protected(obj->fence_excl,
-					 dma_resv_held(obj));
-}
-
 /**
  * dma_resv_get_excl_rcu - get the reservation object's
  * exclusive fence, without lock held.
@@ -278,11 +256,8 @@ dma_resv_get_excl_rcu(struct dma_resv *obj)
 {
 	struct dma_fence *fence;
 
-	if (!rcu_access_pointer(obj->fence_excl))
-		return NULL;
-
 	rcu_read_lock();
-	fence = dma_fence_get_rcu_safe(&obj->fence_excl);
+	fence = dma_resv_fences_get_rcu(&obj->writers);
 	rcu_read_unlock();
 
 	return fence;
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/10] dma-buf/resv: add other operations
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 12:31   ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Additional to readers and writers add another class of operations
which never participate in implicit synchronization.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c | 27 ++++++++++++++++++++++++---
 include/linux/dma-resv.h   |  2 ++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 8ef7dbc7fd8e..c6dd6c36dba2 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -280,6 +280,7 @@ void dma_resv_init(struct dma_resv *obj)
 
 	dma_resv_fences_init(&obj->writers);
 	dma_resv_fences_init(&obj->readers);
+	dma_resv_fences_init(&obj->others);
 }
 EXPORT_SYMBOL(dma_resv_init);
 
@@ -295,6 +296,7 @@ void dma_resv_fini(struct dma_resv *obj)
 	 */
 	dma_resv_fences_fini(&obj->writers);
 	dma_resv_fences_fini(&obj->readers);
+	dma_resv_fences_fini(&obj->others);
 	ww_mutex_destroy(&obj->lock);
 }
 EXPORT_SYMBOL(dma_resv_fini);
@@ -334,6 +336,10 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 	fence = dma_resv_fences_deref(obj, &obj->readers);
 	if (dma_fence_is_signaled(fence))
 		dma_resv_fences_set(obj, &obj->readers, NULL);
+
+	fence = dma_resv_fences_deref(obj, &obj->others);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->others, NULL);
 }
 EXPORT_SYMBOL(dma_resv_prune_fences);
 
@@ -346,17 +352,19 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&src->writers);
 	readers = dma_resv_fences_get_rcu(&src->readers);
+	others = dma_resv_fences_get_rcu(&src->others);
 	rcu_read_unlock();
 
 	dma_resv_fences_set(dst, &dst->writers, writers);
 	dma_resv_fences_set(dst, &dst->readers, readers);
+	dma_resv_fences_set(dst, &dst->readers, others);
 
 	return 0;
 }
@@ -440,12 +448,13 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 	long ret = timeout ? timeout : 1;
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
+	others = dma_resv_fences_get_rcu(&obj->others);
 	rcu_read_unlock();
 
 	if (wait_all && readers) {
@@ -454,12 +463,19 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			goto out;
 	}
 
+	if (wait_all && others) {
+		ret = dma_fence_wait_timeout(others, intr, ret);
+		if (ret <= 0)
+			goto out;
+	}
+
 	if (writers)
 		ret = dma_fence_wait_timeout(writers, intr, ret);
 
 out:
 	dma_fence_put(writers);
 	dma_fence_put(readers);
+	dma_fence_put(others);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
@@ -476,12 +492,13 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 	bool ret = true;
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
+	others = dma_resv_fences_get_rcu(&obj->others);
 	rcu_read_unlock();
 
 	if (writers)
@@ -490,8 +507,12 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 	if (test_all && readers)
 		ret &= dma_fence_is_signaled(readers);
 
+	if (test_all && others)
+		ret &= dma_fence_is_signaled(others);
+
 	dma_fence_put(writers);
 	dma_fence_put(readers);
+	dma_fence_put(others);
 
 	return ret;
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 72c3c4f99711..bf8d21cc7720 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -76,12 +76,14 @@ void dma_resv_fences_commit(struct dma_resv *obj,
  * @seq: sequence count for managing RCU read-side synchronization
  * @writers: array of write operations for implicit sync
  * @readers: array of read operations for implicit sync
+ * @others: other operations not participating in implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
 
 	struct dma_resv_fences writers;
 	struct dma_resv_fences readers;
+	struct dma_resv_fences others;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/10] dma-buf/resv: add other operations
@ 2019-08-21 12:31   ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-21 12:31 UTC (permalink / raw)
  To: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Additional to readers and writers add another class of operations
which never participate in implicit synchronization.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c | 27 ++++++++++++++++++++++++---
 include/linux/dma-resv.h   |  2 ++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 8ef7dbc7fd8e..c6dd6c36dba2 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -280,6 +280,7 @@ void dma_resv_init(struct dma_resv *obj)
 
 	dma_resv_fences_init(&obj->writers);
 	dma_resv_fences_init(&obj->readers);
+	dma_resv_fences_init(&obj->others);
 }
 EXPORT_SYMBOL(dma_resv_init);
 
@@ -295,6 +296,7 @@ void dma_resv_fini(struct dma_resv *obj)
 	 */
 	dma_resv_fences_fini(&obj->writers);
 	dma_resv_fences_fini(&obj->readers);
+	dma_resv_fences_fini(&obj->others);
 	ww_mutex_destroy(&obj->lock);
 }
 EXPORT_SYMBOL(dma_resv_fini);
@@ -334,6 +336,10 @@ void dma_resv_prune_fences(struct dma_resv *obj)
 	fence = dma_resv_fences_deref(obj, &obj->readers);
 	if (dma_fence_is_signaled(fence))
 		dma_resv_fences_set(obj, &obj->readers, NULL);
+
+	fence = dma_resv_fences_deref(obj, &obj->others);
+	if (dma_fence_is_signaled(fence))
+		dma_resv_fences_set(obj, &obj->others, NULL);
 }
 EXPORT_SYMBOL(dma_resv_prune_fences);
 
@@ -346,17 +352,19 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
 */
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 
 	dma_resv_assert_held(dst);
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&src->writers);
 	readers = dma_resv_fences_get_rcu(&src->readers);
+	others = dma_resv_fences_get_rcu(&src->others);
 	rcu_read_unlock();
 
 	dma_resv_fences_set(dst, &dst->writers, writers);
 	dma_resv_fences_set(dst, &dst->readers, readers);
+	dma_resv_fences_set(dst, &dst->readers, others);
 
 	return 0;
 }
@@ -440,12 +448,13 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			       bool wait_all, bool intr,
 			       unsigned long timeout)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 	long ret = timeout ? timeout : 1;
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
+	others = dma_resv_fences_get_rcu(&obj->others);
 	rcu_read_unlock();
 
 	if (wait_all && readers) {
@@ -454,12 +463,19 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
 			goto out;
 	}
 
+	if (wait_all && others) {
+		ret = dma_fence_wait_timeout(others, intr, ret);
+		if (ret <= 0)
+			goto out;
+	}
+
 	if (writers)
 		ret = dma_fence_wait_timeout(writers, intr, ret);
 
 out:
 	dma_fence_put(writers);
 	dma_fence_put(readers);
+	dma_fence_put(others);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
@@ -476,12 +492,13 @@ EXPORT_SYMBOL_GPL(dma_resv_wait_timeout_rcu);
  */
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 {
-	struct dma_fence *writers, *readers;
+	struct dma_fence *writers, *readers, *others;
 	bool ret = true;
 
 	rcu_read_lock();
 	writers = dma_resv_fences_get_rcu(&obj->writers);
 	readers = dma_resv_fences_get_rcu(&obj->readers);
+	others = dma_resv_fences_get_rcu(&obj->others);
 	rcu_read_unlock();
 
 	if (writers)
@@ -490,8 +507,12 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 	if (test_all && readers)
 		ret &= dma_fence_is_signaled(readers);
 
+	if (test_all && others)
+		ret &= dma_fence_is_signaled(others);
+
 	dma_fence_put(writers);
 	dma_fence_put(readers);
+	dma_fence_put(others);
 
 	return ret;
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 72c3c4f99711..bf8d21cc7720 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -76,12 +76,14 @@ void dma_resv_fences_commit(struct dma_resv *obj,
  * @seq: sequence count for managing RCU read-side synchronization
  * @writers: array of write operations for implicit sync
  * @readers: array of read operations for implicit sync
+ * @others: other operations not participating in implicit sync
  */
 struct dma_resv {
 	struct ww_mutex lock;
 
 	struct dma_resv_fences writers;
 	struct dma_resv_fences readers;
+	struct dma_resv_fences others;
 };
 
 #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
  2019-08-21 12:31   ` Christian König
@ 2019-08-21 14:55     ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 14:55 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:42)
> Add a new dma_resv_prune_fences() function to improve memory management.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
>  drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
>  drivers/gpu/drm/i915/i915_vma.c            |  3 +-
>  drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
>  include/linux/dma-resv.h                   |  1 +
>  6 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 42a8f3f11681..24adc32d36d4 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
>  }
>  EXPORT_SYMBOL(dma_resv_add_excl_fence);
>  
> +/**
> + * dma_resv_prune_fences - prune signaled fences from the resv object
> + * @obj: the reservation object to prune
> + *
> + * Prune all signaled fences from the reservation object.
> + */
> +void dma_resv_prune_fences(struct dma_resv *obj)
> +{
> +       struct dma_resv_list *list;
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       dma_resv_assert_held(obj);
> +
> +       fence = dma_resv_get_excl(obj);
> +       if (dma_fence_is_signaled(fence)) {
> +               RCU_INIT_POINTER(obj->fence_excl, NULL);
> +               dma_fence_put(fence);
> +       }
> +
> +       list = dma_resv_get_list(obj);
> +       if (!list)
> +               return;
> +
> +       for (i = 0; i < list->shared_count; ++i) {
> +               fence = rcu_dereference_protected(list->shared[i],
> +                                                 dma_resv_held(obj));
> +
> +               if (!dma_fence_is_signaled(fence))
> +                       continue;
> +
> +               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> +               dma_fence_put(fence);

Not worth reusing the compaction logic from add_shared_fence?
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
@ 2019-08-21 14:55     ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 14:55 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:42)
> Add a new dma_resv_prune_fences() function to improve memory management.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
>  drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
>  drivers/gpu/drm/i915/i915_vma.c            |  3 +-
>  drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
>  include/linux/dma-resv.h                   |  1 +
>  6 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 42a8f3f11681..24adc32d36d4 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
>  }
>  EXPORT_SYMBOL(dma_resv_add_excl_fence);
>  
> +/**
> + * dma_resv_prune_fences - prune signaled fences from the resv object
> + * @obj: the reservation object to prune
> + *
> + * Prune all signaled fences from the reservation object.
> + */
> +void dma_resv_prune_fences(struct dma_resv *obj)
> +{
> +       struct dma_resv_list *list;
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       dma_resv_assert_held(obj);
> +
> +       fence = dma_resv_get_excl(obj);
> +       if (dma_fence_is_signaled(fence)) {
> +               RCU_INIT_POINTER(obj->fence_excl, NULL);
> +               dma_fence_put(fence);
> +       }
> +
> +       list = dma_resv_get_list(obj);
> +       if (!list)
> +               return;
> +
> +       for (i = 0; i < list->shared_count; ++i) {
> +               fence = rcu_dereference_protected(list->shared[i],
> +                                                 dma_resv_held(obj));
> +
> +               if (!dma_fence_is_signaled(fence))
> +                       continue;
> +
> +               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> +               dma_fence_put(fence);

Not worth reusing the compaction logic from add_shared_fence?
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
  2019-08-21 14:55     ` Chris Wilson
@ 2019-08-21 14:56       ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 14:56 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Chris Wilson (2019-08-21 15:55:08)
> Quoting Christian König (2019-08-21 13:31:42)
> > Add a new dma_resv_prune_fences() function to improve memory management.
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
> >  drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
> >  drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
> >  drivers/gpu/drm/i915/i915_vma.c            |  3 +-
> >  drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
> >  include/linux/dma-resv.h                   |  1 +
> >  6 files changed, 42 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > index 42a8f3f11681..24adc32d36d4 100644
> > --- a/drivers/dma-buf/dma-resv.c
> > +++ b/drivers/dma-buf/dma-resv.c
> > @@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
> >  }
> >  EXPORT_SYMBOL(dma_resv_add_excl_fence);
> >  
> > +/**
> > + * dma_resv_prune_fences - prune signaled fences from the resv object
> > + * @obj: the reservation object to prune
> > + *
> > + * Prune all signaled fences from the reservation object.
> > + */
> > +void dma_resv_prune_fences(struct dma_resv *obj)
> > +{
> > +       struct dma_resv_list *list;
> > +       struct dma_fence *fence;
> > +       unsigned int i;
> > +
> > +       dma_resv_assert_held(obj);
> > +
> > +       fence = dma_resv_get_excl(obj);
> > +       if (dma_fence_is_signaled(fence)) {
> > +               RCU_INIT_POINTER(obj->fence_excl, NULL);
> > +               dma_fence_put(fence);
> > +       }
> > +
> > +       list = dma_resv_get_list(obj);
> > +       if (!list)
> > +               return;
> > +
> > +       for (i = 0; i < list->shared_count; ++i) {
> > +               fence = rcu_dereference_protected(list->shared[i],
> > +                                                 dma_resv_held(obj));
> > +
> > +               if (!dma_fence_is_signaled(fence))
> > +                       continue;
> > +
> > +               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> > +               dma_fence_put(fence);
> 
> Not worth reusing the compaction logic from add_shared_fence?

Scratch that, you're going to rewrite the shared fence container.
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences
@ 2019-08-21 14:56       ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 14:56 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Chris Wilson (2019-08-21 15:55:08)
> Quoting Christian König (2019-08-21 13:31:42)
> > Add a new dma_resv_prune_fences() function to improve memory management.
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/dma-buf/dma-resv.c                 | 37 ++++++++++++++++++++++
> >  drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  3 +-
> >  drivers/gpu/drm/i915/i915_gem_batch_pool.c |  2 +-
> >  drivers/gpu/drm/i915/i915_vma.c            |  3 +-
> >  drivers/gpu/drm/ttm/ttm_bo.c               |  2 +-
> >  include/linux/dma-resv.h                   |  1 +
> >  6 files changed, 42 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > index 42a8f3f11681..24adc32d36d4 100644
> > --- a/drivers/dma-buf/dma-resv.c
> > +++ b/drivers/dma-buf/dma-resv.c
> > @@ -301,6 +301,43 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
> >  }
> >  EXPORT_SYMBOL(dma_resv_add_excl_fence);
> >  
> > +/**
> > + * dma_resv_prune_fences - prune signaled fences from the resv object
> > + * @obj: the reservation object to prune
> > + *
> > + * Prune all signaled fences from the reservation object.
> > + */
> > +void dma_resv_prune_fences(struct dma_resv *obj)
> > +{
> > +       struct dma_resv_list *list;
> > +       struct dma_fence *fence;
> > +       unsigned int i;
> > +
> > +       dma_resv_assert_held(obj);
> > +
> > +       fence = dma_resv_get_excl(obj);
> > +       if (dma_fence_is_signaled(fence)) {
> > +               RCU_INIT_POINTER(obj->fence_excl, NULL);
> > +               dma_fence_put(fence);
> > +       }
> > +
> > +       list = dma_resv_get_list(obj);
> > +       if (!list)
> > +               return;
> > +
> > +       for (i = 0; i < list->shared_count; ++i) {
> > +               fence = rcu_dereference_protected(list->shared[i],
> > +                                                 dma_resv_held(obj));
> > +
> > +               if (!dma_fence_is_signaled(fence))
> > +                       continue;
> > +
> > +               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> > +               dma_fence_put(fence);
> 
> Not worth reusing the compaction logic from add_shared_fence?

Scratch that, you're going to rewrite the shared fence container.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 12:31   ` Christian König
@ 2019-08-21 15:24     ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 15:24 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>                 busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>  
>         /* Translate shared fences to READ set of engines */
> -       list = rcu_dereference(obj->base.resv->fence);
> -       if (list) {
> -               unsigned int shared_count = list->shared_count, i;
> -
> -               for (i = 0; i < shared_count; ++i) {
> -                       struct dma_fence *fence =
> -                               rcu_dereference(list->shared[i]);
> -
> -                       args->busy |= busy_check_reader(fence);
> -               }
> -       }
> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
> +       dma_fence_array_for_each(fence, cursor, readers)
> +               args->busy |= busy_check_reader(fence);
> +       dma_fence_put(readers);

That's underwhelming, the full-mb shows up in scaling tests (I'll test
the impact of this series later). Something like,

do {
	read = 0;
	fences = dma_resv_fences_get_deref(&obj->base.resv->readers);
	dma_fence_array_for_each(fence, cursor, fences)
		read |= busy_check_reader(fence);
	smp_rmb();
} while (dma_resv_fences_get_deref(obj->readers) != fences)

do {
	fences = dma_resv_fences_get_deref(&obj->base.resv->fences);
	write = busy_check_writer(fences);
	smp_rmb();
} while (dma_resv_fences_get_deref(obj->writes) != fences)

args->busy = write | read;

Perhaps?
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-21 15:24     ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 15:24 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>                 busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>  
>         /* Translate shared fences to READ set of engines */
> -       list = rcu_dereference(obj->base.resv->fence);
> -       if (list) {
> -               unsigned int shared_count = list->shared_count, i;
> -
> -               for (i = 0; i < shared_count; ++i) {
> -                       struct dma_fence *fence =
> -                               rcu_dereference(list->shared[i]);
> -
> -                       args->busy |= busy_check_reader(fence);
> -               }
> -       }
> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
> +       dma_fence_array_for_each(fence, cursor, readers)
> +               args->busy |= busy_check_reader(fence);
> +       dma_fence_put(readers);

That's underwhelming, the full-mb shows up in scaling tests (I'll test
the impact of this series later). Something like,

do {
	read = 0;
	fences = dma_resv_fences_get_deref(&obj->base.resv->readers);
	dma_fence_array_for_each(fence, cursor, fences)
		read |= busy_check_reader(fence);
	smp_rmb();
} while (dma_resv_fences_get_deref(obj->readers) != fences)

do {
	fences = dma_resv_fences_get_deref(&obj->base.resv->fences);
	write = busy_check_writer(fences);
	smp_rmb();
} while (dma_resv_fences_get_deref(obj->writes) != fences)

args->busy = write | read;

Perhaps?
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
  2019-08-21 12:31   ` Christian König
@ 2019-08-21 16:04     ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 16:04 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
> Add a new container for fences which internally uses
> dma_fence_array's to store the fences.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c | 221 +++++++++++++++++++++++++++++++++++++
>  include/linux/dma-resv.h   |  49 ++++++++
>  2 files changed, 270 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index d3a9a3bb15f0..83033b3e8521 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -33,6 +33,7 @@
>   */
>  
>  #include <linux/dma-resv.h>
> +#include <linux/dma-fence-array.h>
>  #include <linux/export.h>
>  
>  /**
> @@ -55,6 +56,226 @@ EXPORT_SYMBOL(reservation_seqcount_class);
>  const char reservation_seqcount_string[] = "reservation_seqcount";
>  EXPORT_SYMBOL(reservation_seqcount_string);
>  
> +static void dma_resv_fences_init(struct dma_resv_fences *fences)
> +{
> +	RCU_INIT_POINTER(fences->fence, NULL);
> +	fences->staged = NULL;
> +}
> +
> +static void dma_resv_fences_fini(struct dma_resv_fences *fences)
> +{
> +	/*
> +	 * This object should be dead and all references must have
> +	 * been released to it, so no need to be protected with rcu.
> +	 */
> +	dma_fence_put(rcu_dereference_protected(fences->fence, true));
> +	dma_fence_array_free(fences->staged);
> +}
> +
> +/**
> + * dma_resv_fences_reserve - allocate fence slots
> + * @fences: fences object where we need slots
> + * @num_fences: number of fence slots we need
> + *
> + * Make sure that we have at least @num_fences + all the existing ones free
> + * slots in the staged dma_fence_array.
> + *
> + * Returns -ENOMEM on allocation failure, 0 otherwise.
> + */
> +int dma_resv_fences_reserve(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences,
> +			    unsigned int num_fences)
> +{
> +	struct dma_fence *fence = dma_resv_fences_deref(obj, fences);
> +	struct dma_fence_array *staged, *array;
> +	unsigned int i;
> +
> +	array = fences->staged;
> +	if (!array)
> +		array = to_dma_fence_array(fence);
> +
> +	if (array)
> +		num_fences += array->num_fences;
> +	else if (fence)
> +		num_fences += 1;
> +
> +	staged = fences->staged;
> +	if (staged && dma_fence_array_max_fences(staged) >= num_fences)
> +		return 0;
> +
> +	staged = dma_fence_array_alloc(num_fences, NULL);
> +	if (!staged)
> +		return -ENOMEM;
> +
> +	/* Copy over all fences from the old object */
> +	if (array) {
> +		for (i = 0; i < array->num_fences; ++i) {
> +			struct dma_fence *f = array->fences[i];
> +
> +			staged->fences[i] = dma_fence_get(f);
> +		}
> +		staged->num_fences = array->num_fences;
> +
> +	} else if (fence) {
> +		staged->fences[0] = dma_fence_get(fence);
> +		staged->num_fences = 1;
> +
> +	} else {
> +		staged->num_fences = 0;
> +	}
> +
> +	dma_fence_array_free(fences->staged);
> +	fences->staged = staged;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_resv_fences_reserve);
> +
> +/**
> + * dma_resv_fences_assign - set the singleton fence
> + * @fences: fences object where to set the fence
> + * @fence: singleton fence for the object
> + *
> + * Internal helper to assign the signleton fence without grapping a reference.
> + * If the old fence is a dma_fence_array try to recycle it.
> + */
> +static void dma_resv_fences_assign(struct dma_resv *obj,
> +				   struct dma_resv_fences *fences,
> +				   struct dma_fence *fence)
> +{
> +	struct dma_fence_array *array, *staged;
> +	unsigned int num_fences, i;
> +	struct dma_fence *old;
> +
> +	old = dma_resv_fences_deref(obj, fences);
> +	rcu_assign_pointer(fences->fence, fence);
> +
> +	dma_fence_array_free(fences->staged);
> +	fences->staged = NULL;
> +
> +	/* Try to recycle the old fence array */
> +	staged = to_dma_fence_array(old);
> +	if (!staged)
> +		goto drop_old;
> +
> +	array = to_dma_fence_array(fence);
> +	if (array)
> +		num_fences = array->num_fences;
> +	else
> +		num_fences = fence ? 1 : 0;
> +
> +	if (dma_fence_array_max_fences(staged) < num_fences)
> +		goto drop_old;
> +
> +	/* Try to drop the last reference */
> +	if (!dma_fence_array_recycle(staged))

Without an rcu barrier here you're not syncing to new clients at all.
I don't think this works, and I expect that once you've readded all the
barriers and retry loops we're back to seqlocks.
-Daniel

> +		return;
> +
> +	/* Make sure the staged array has the latest fences */
> +	if (array) {
> +		for (i = 0; i < array->num_fences; ++i) {
> +			struct dma_fence *f = array->fences[i];
> +
> +			if (f == staged->fences[i])
> +				continue;
> +
> +			dma_fence_put(staged->fences[i]);
> +			staged->fences[i] = dma_fence_get(f);
> +		}
> +		for (;i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->num_fences = array->num_fences;
> +
> +	} else if (fence) {
> +		for (i = 0; i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->fences[0] = dma_fence_get(fence);
> +		staged->num_fences = 1;
> +	} else {
> +		for (i = 0; i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->num_fences = 0;
> +	}
> +
> +	fences->staged = staged;
> +	return;
> +
> +drop_old:
> +	dma_fence_put(old);
> +}
> +
> +/**
> + * dma_resv_fences_set - set the singleton fence
> + * @fences: fences object where to set the fence
> + * @fence: singleton fence for the object
> + *
> + * Grabs a reference to the new fence and replaces the current singleton fence
> + * with a new one. If the old fence is a dma_fence_array try to recycle it.
> + */
> +void dma_resv_fences_set(struct dma_resv *obj,
> +			 struct dma_resv_fences *fences,
> +			 struct dma_fence *fence)
> +{
> +	dma_fence_get(fence);
> +	dma_resv_fences_assign(obj, fences, fence);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_set);
> +
> +/**
> + * dma_resv_fences_add - add a fence to the staged fence_array
> + * @fences: fences object where to add the fence to
> + * @fence: fence to add
> + *
> + * Add a new fence to the staged fence_array.
> + */
> +void dma_resv_fences_add(struct dma_resv_fences *fences,
> +			 struct dma_fence *fence)
> +{
> +	struct dma_fence_array *staged = fences->staged;
> +	struct dma_fence *old;
> +	unsigned int i;
> +
> +#ifndef CONFIG_DEBUG_MUTEXES
> +	for (i = 0; i < staged->num_fences; ++i) {
> +		old = staged->fences[i];
> +
> +		if (old->context == fence->context ||
> +		    dma_fence_is_signaled(old)) {
> +			dma_fence_put(old);
> +			goto replace;
> +		}
> +	}
> +#endif
> +
> +	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
> +	i = staged->num_fences++;
> +
> +replace:
> +	staged->fences[i] = dma_fence_get(fence);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_add);
> +
> +/**
> + * dma_resv_fences_commit - commit the staged dma_fence_array
> + * @fences: fences object where the commit should happen
> + *
> + * Commit the fences staged in the dma_fence_array and make them visible to
> + * other threads.
> + */
> +void dma_resv_fences_commit(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences)
> +{
> +	struct dma_fence_array *staged = fences->staged;
> +
> +	if (!staged || !staged->num_fences)
> +		return;
> +
> +	fences->staged = NULL;
> +	dma_fence_array_init(staged, dma_fence_context_alloc(1), 1, false);
> +	dma_resv_fences_assign(obj, fences, &staged->base);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_commit);
> +
>  /**
>   * dma_resv_list_alloc - allocate fence list
>   * @shared_max: number of fences we need space for
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 03b0f95682b0..c70f13fa6789 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -45,10 +45,33 @@
>  #include <linux/seqlock.h>
>  #include <linux/rcupdate.h>
>  
> +struct dma_resv;
> +
>  extern struct ww_class reservation_ww_class;
>  extern struct lock_class_key reservation_seqcount_class;
>  extern const char reservation_seqcount_string[];
>  
> +/**
> + * struct dma_resv_fences - fences inside a reservation object
> + * @fence: the current RCU protected singleton fence
> + * @staged: optional staged dma_fence_array to replace @fence
> + */
> +struct dma_resv_fences {
> +	struct dma_fence __rcu *fence;
> +	struct dma_fence_array *staged;
> +};
> +
> +int dma_resv_fences_reserve(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences,
> +			    unsigned int num_fences);
> +void dma_resv_fences_set(struct dma_resv *obj,
> +			 struct dma_resv_fences *fences,
> +			 struct dma_fence *fence);
> +void dma_resv_fences_add(struct dma_resv_fences *fences,
> +			 struct dma_fence *fence);
> +void dma_resv_fences_commit(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences);
> +
>  /**
>   * struct dma_resv_list - a list of shared fences
>   * @rcu: for internal use
> @@ -80,6 +103,32 @@ struct dma_resv {
>  #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
>  #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base)
>  
> +/**
> + * dma_resv_fences_deref - get singleton fence
> + * @obj: the reservation object
> + * @fences: the fences object
> + *
> + * Returns the singleton fence from a resv_fences object.
> + */
> +static inline struct dma_fence *
> +dma_resv_fences_deref(struct dma_resv *obj, struct dma_resv_fences *fences)
> +{
> +	return rcu_dereference_protected(fences->fence,
> +					 dma_resv_held(obj));
> +}
> +
> +/**
> + * dma_resv_fences_get_rcu - RCU get single fence
> + * @fences: fences structure where we need to get a reference for
> + *
> + * Get a reference to the single fence representing the synchronization.
> + */
> +static inline struct dma_fence *
> +dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
> +{
> +	return dma_fence_get_rcu_safe(&fences->fence);
> +}
> +
>  /**
>   * dma_resv_get_list - get the reservation object's
>   * shared fence list, with update-side lock held
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
@ 2019-08-21 16:04     ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 16:04 UTC (permalink / raw)
  To: Christian König; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
> Add a new container for fences which internally uses
> dma_fence_array's to store the fences.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c | 221 +++++++++++++++++++++++++++++++++++++
>  include/linux/dma-resv.h   |  49 ++++++++
>  2 files changed, 270 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index d3a9a3bb15f0..83033b3e8521 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -33,6 +33,7 @@
>   */
>  
>  #include <linux/dma-resv.h>
> +#include <linux/dma-fence-array.h>
>  #include <linux/export.h>
>  
>  /**
> @@ -55,6 +56,226 @@ EXPORT_SYMBOL(reservation_seqcount_class);
>  const char reservation_seqcount_string[] = "reservation_seqcount";
>  EXPORT_SYMBOL(reservation_seqcount_string);
>  
> +static void dma_resv_fences_init(struct dma_resv_fences *fences)
> +{
> +	RCU_INIT_POINTER(fences->fence, NULL);
> +	fences->staged = NULL;
> +}
> +
> +static void dma_resv_fences_fini(struct dma_resv_fences *fences)
> +{
> +	/*
> +	 * This object should be dead and all references must have
> +	 * been released to it, so no need to be protected with rcu.
> +	 */
> +	dma_fence_put(rcu_dereference_protected(fences->fence, true));
> +	dma_fence_array_free(fences->staged);
> +}
> +
> +/**
> + * dma_resv_fences_reserve - allocate fence slots
> + * @fences: fences object where we need slots
> + * @num_fences: number of fence slots we need
> + *
> + * Make sure that we have at least @num_fences + all the existing ones free
> + * slots in the staged dma_fence_array.
> + *
> + * Returns -ENOMEM on allocation failure, 0 otherwise.
> + */
> +int dma_resv_fences_reserve(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences,
> +			    unsigned int num_fences)
> +{
> +	struct dma_fence *fence = dma_resv_fences_deref(obj, fences);
> +	struct dma_fence_array *staged, *array;
> +	unsigned int i;
> +
> +	array = fences->staged;
> +	if (!array)
> +		array = to_dma_fence_array(fence);
> +
> +	if (array)
> +		num_fences += array->num_fences;
> +	else if (fence)
> +		num_fences += 1;
> +
> +	staged = fences->staged;
> +	if (staged && dma_fence_array_max_fences(staged) >= num_fences)
> +		return 0;
> +
> +	staged = dma_fence_array_alloc(num_fences, NULL);
> +	if (!staged)
> +		return -ENOMEM;
> +
> +	/* Copy over all fences from the old object */
> +	if (array) {
> +		for (i = 0; i < array->num_fences; ++i) {
> +			struct dma_fence *f = array->fences[i];
> +
> +			staged->fences[i] = dma_fence_get(f);
> +		}
> +		staged->num_fences = array->num_fences;
> +
> +	} else if (fence) {
> +		staged->fences[0] = dma_fence_get(fence);
> +		staged->num_fences = 1;
> +
> +	} else {
> +		staged->num_fences = 0;
> +	}
> +
> +	dma_fence_array_free(fences->staged);
> +	fences->staged = staged;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(dma_resv_fences_reserve);
> +
> +/**
> + * dma_resv_fences_assign - set the singleton fence
> + * @fences: fences object where to set the fence
> + * @fence: singleton fence for the object
> + *
> + * Internal helper to assign the signleton fence without grapping a reference.
> + * If the old fence is a dma_fence_array try to recycle it.
> + */
> +static void dma_resv_fences_assign(struct dma_resv *obj,
> +				   struct dma_resv_fences *fences,
> +				   struct dma_fence *fence)
> +{
> +	struct dma_fence_array *array, *staged;
> +	unsigned int num_fences, i;
> +	struct dma_fence *old;
> +
> +	old = dma_resv_fences_deref(obj, fences);
> +	rcu_assign_pointer(fences->fence, fence);
> +
> +	dma_fence_array_free(fences->staged);
> +	fences->staged = NULL;
> +
> +	/* Try to recycle the old fence array */
> +	staged = to_dma_fence_array(old);
> +	if (!staged)
> +		goto drop_old;
> +
> +	array = to_dma_fence_array(fence);
> +	if (array)
> +		num_fences = array->num_fences;
> +	else
> +		num_fences = fence ? 1 : 0;
> +
> +	if (dma_fence_array_max_fences(staged) < num_fences)
> +		goto drop_old;
> +
> +	/* Try to drop the last reference */
> +	if (!dma_fence_array_recycle(staged))

Without an rcu barrier here you're not syncing to new clients at all.
I don't think this works, and I expect that once you've readded all the
barriers and retry loops we're back to seqlocks.
-Daniel

> +		return;
> +
> +	/* Make sure the staged array has the latest fences */
> +	if (array) {
> +		for (i = 0; i < array->num_fences; ++i) {
> +			struct dma_fence *f = array->fences[i];
> +
> +			if (f == staged->fences[i])
> +				continue;
> +
> +			dma_fence_put(staged->fences[i]);
> +			staged->fences[i] = dma_fence_get(f);
> +		}
> +		for (;i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->num_fences = array->num_fences;
> +
> +	} else if (fence) {
> +		for (i = 0; i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->fences[0] = dma_fence_get(fence);
> +		staged->num_fences = 1;
> +	} else {
> +		for (i = 0; i < staged->num_fences; ++i)
> +			dma_fence_put(staged->fences[i]);
> +		staged->num_fences = 0;
> +	}
> +
> +	fences->staged = staged;
> +	return;
> +
> +drop_old:
> +	dma_fence_put(old);
> +}
> +
> +/**
> + * dma_resv_fences_set - set the singleton fence
> + * @fences: fences object where to set the fence
> + * @fence: singleton fence for the object
> + *
> + * Grabs a reference to the new fence and replaces the current singleton fence
> + * with a new one. If the old fence is a dma_fence_array try to recycle it.
> + */
> +void dma_resv_fences_set(struct dma_resv *obj,
> +			 struct dma_resv_fences *fences,
> +			 struct dma_fence *fence)
> +{
> +	dma_fence_get(fence);
> +	dma_resv_fences_assign(obj, fences, fence);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_set);
> +
> +/**
> + * dma_resv_fences_add - add a fence to the staged fence_array
> + * @fences: fences object where to add the fence to
> + * @fence: fence to add
> + *
> + * Add a new fence to the staged fence_array.
> + */
> +void dma_resv_fences_add(struct dma_resv_fences *fences,
> +			 struct dma_fence *fence)
> +{
> +	struct dma_fence_array *staged = fences->staged;
> +	struct dma_fence *old;
> +	unsigned int i;
> +
> +#ifndef CONFIG_DEBUG_MUTEXES
> +	for (i = 0; i < staged->num_fences; ++i) {
> +		old = staged->fences[i];
> +
> +		if (old->context == fence->context ||
> +		    dma_fence_is_signaled(old)) {
> +			dma_fence_put(old);
> +			goto replace;
> +		}
> +	}
> +#endif
> +
> +	BUG_ON(staged->num_fences >= dma_fence_array_max_fences(staged));
> +	i = staged->num_fences++;
> +
> +replace:
> +	staged->fences[i] = dma_fence_get(fence);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_add);
> +
> +/**
> + * dma_resv_fences_commit - commit the staged dma_fence_array
> + * @fences: fences object where the commit should happen
> + *
> + * Commit the fences staged in the dma_fence_array and make them visible to
> + * other threads.
> + */
> +void dma_resv_fences_commit(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences)
> +{
> +	struct dma_fence_array *staged = fences->staged;
> +
> +	if (!staged || !staged->num_fences)
> +		return;
> +
> +	fences->staged = NULL;
> +	dma_fence_array_init(staged, dma_fence_context_alloc(1), 1, false);
> +	dma_resv_fences_assign(obj, fences, &staged->base);
> +}
> +EXPORT_SYMBOL(dma_resv_fences_commit);
> +
>  /**
>   * dma_resv_list_alloc - allocate fence list
>   * @shared_max: number of fences we need space for
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 03b0f95682b0..c70f13fa6789 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -45,10 +45,33 @@
>  #include <linux/seqlock.h>
>  #include <linux/rcupdate.h>
>  
> +struct dma_resv;
> +
>  extern struct ww_class reservation_ww_class;
>  extern struct lock_class_key reservation_seqcount_class;
>  extern const char reservation_seqcount_string[];
>  
> +/**
> + * struct dma_resv_fences - fences inside a reservation object
> + * @fence: the current RCU protected singleton fence
> + * @staged: optional staged dma_fence_array to replace @fence
> + */
> +struct dma_resv_fences {
> +	struct dma_fence __rcu *fence;
> +	struct dma_fence_array *staged;
> +};
> +
> +int dma_resv_fences_reserve(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences,
> +			    unsigned int num_fences);
> +void dma_resv_fences_set(struct dma_resv *obj,
> +			 struct dma_resv_fences *fences,
> +			 struct dma_fence *fence);
> +void dma_resv_fences_add(struct dma_resv_fences *fences,
> +			 struct dma_fence *fence);
> +void dma_resv_fences_commit(struct dma_resv *obj,
> +			    struct dma_resv_fences *fences);
> +
>  /**
>   * struct dma_resv_list - a list of shared fences
>   * @rcu: for internal use
> @@ -80,6 +103,32 @@ struct dma_resv {
>  #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
>  #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base)
>  
> +/**
> + * dma_resv_fences_deref - get singleton fence
> + * @obj: the reservation object
> + * @fences: the fences object
> + *
> + * Returns the singleton fence from a resv_fences object.
> + */
> +static inline struct dma_fence *
> +dma_resv_fences_deref(struct dma_resv *obj, struct dma_resv_fences *fences)
> +{
> +	return rcu_dereference_protected(fences->fence,
> +					 dma_resv_held(obj));
> +}
> +
> +/**
> + * dma_resv_fences_get_rcu - RCU get single fence
> + * @fences: fences structure where we need to get a reference for
> + *
> + * Get a reference to the single fence representing the synchronization.
> + */
> +static inline struct dma_fence *
> +dma_resv_fences_get_rcu(struct dma_resv_fences *fences)
> +{
> +	return dma_fence_get_rcu_safe(&fences->fence);
> +}
> +
>  /**
>   * dma_resv_get_list - get the reservation object's
>   * shared fence list, with update-side lock held
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 16:13   ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 16:13 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
> Hi everyone,
> 
> In previous discussion it surfaced that different drivers use the shared
> and explicit fences in the dma_resv object with different meanings.
> 
> This is problematic when we share buffers between those drivers and
> requirements for implicit and explicit synchronization leaded to quite a
> number of workarounds related to this.
> 
> So I started an effort to get all drivers back to a common understanding
> of what the fences in the dma_resv object mean and be able to use the
> object for different kind of workloads independent of the classic DRM
> command submission interface.
> 
> The result is this patch set which modifies the dma_resv API to get away
> from a single explicit fence and multiple shared fences, towards a
> notation where we have explicit categories for writers, readers and
> others.
> 
> To do this I came up with a new container called dma_resv_fences which
> can store both a single fence as well as multiple fences in a
> dma_fence_array.
> 
> This turned out to actually be even be quite a bit simpler, since we
> don't need any complicated dance between RCU and sequence count
> protected updates any more.
> 
> Instead we can just grab a reference to the dma_fence_array under RCU
> and so keep the current state of synchronization alive until we are done
> with it.
> 
> This results in both a small performance improvement since we don't need
> so many barriers any more, as well as fewer lines of code in the actual
> implementation.

I think you traded lack of barriers/retry loops for correctness here, see
reply later on. But I haven't grokked the full thing in details, so easily
might have missed something.

But high level first, and I don't get this at all. Current state:

Ill defined semantics, no docs. You have to look at the implementations.

New state after you patch series:

Ill defined semantics (but hey different!), no docs. You still have to
look at the implementations to understand what's going on.

I think what has actually changed (aside from the entire implementation)
is just these three things:
- we now allow multiple exclusive fences
- exclusive was renamed to writer fences, shared to reader fences
- there's a new "other" group, for ... otherwordly fences?

Afaiui we have the following to issues with the current fence semantics:
- amdgpu came up with a totally different notion of implicit sync, using
  the owner to figure out when to sync. I have no idea at all how that
  meshes with multiple writers, but I guess there's a connection.
- amdkfd does a very fancy eviction/preempt fence. Is that what the other
  bucket is for?

I guess I could read the amdgpu/ttm code in very fine detail and figure
this out, but I really don't see how that's moving stuff forward.

Also, I think it'd be really good to decouple semantic changes from
implementation changes, because untangling them if we have to revert one
or the other is going to be nigh impossible. And dma_* is not really an
area where we can proudly claim that reverts don't happen.

Cheers, Daniel

> 
> Please review and/or comment,
> Christian. 
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-21 16:13   ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 16:13 UTC (permalink / raw)
  To: Christian König; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
> Hi everyone,
> 
> In previous discussion it surfaced that different drivers use the shared
> and explicit fences in the dma_resv object with different meanings.
> 
> This is problematic when we share buffers between those drivers and
> requirements for implicit and explicit synchronization leaded to quite a
> number of workarounds related to this.
> 
> So I started an effort to get all drivers back to a common understanding
> of what the fences in the dma_resv object mean and be able to use the
> object for different kind of workloads independent of the classic DRM
> command submission interface.
> 
> The result is this patch set which modifies the dma_resv API to get away
> from a single explicit fence and multiple shared fences, towards a
> notation where we have explicit categories for writers, readers and
> others.
> 
> To do this I came up with a new container called dma_resv_fences which
> can store both a single fence as well as multiple fences in a
> dma_fence_array.
> 
> This turned out to actually be even be quite a bit simpler, since we
> don't need any complicated dance between RCU and sequence count
> protected updates any more.
> 
> Instead we can just grab a reference to the dma_fence_array under RCU
> and so keep the current state of synchronization alive until we are done
> with it.
> 
> This results in both a small performance improvement since we don't need
> so many barriers any more, as well as fewer lines of code in the actual
> implementation.

I think you traded lack of barriers/retry loops for correctness here, see
reply later on. But I haven't grokked the full thing in details, so easily
might have missed something.

But high level first, and I don't get this at all. Current state:

Ill defined semantics, no docs. You have to look at the implementations.

New state after you patch series:

Ill defined semantics (but hey different!), no docs. You still have to
look at the implementations to understand what's going on.

I think what has actually changed (aside from the entire implementation)
is just these three things:
- we now allow multiple exclusive fences
- exclusive was renamed to writer fences, shared to reader fences
- there's a new "other" group, for ... otherwordly fences?

Afaiui we have the following to issues with the current fence semantics:
- amdgpu came up with a totally different notion of implicit sync, using
  the owner to figure out when to sync. I have no idea at all how that
  meshes with multiple writers, but I guess there's a connection.
- amdkfd does a very fancy eviction/preempt fence. Is that what the other
  bucket is for?

I guess I could read the amdgpu/ttm code in very fine detail and figure
this out, but I really don't see how that's moving stuff forward.

Also, I think it'd be really good to decouple semantic changes from
implementation changes, because untangling them if we have to revert one
or the other is going to be nigh impossible. And dma_* is not really an
area where we can proudly claim that reverts don't happen.

Cheers, Daniel

> 
> Please review and/or comment,
> Christian. 
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 12:31   ` Christian König
@ 2019-08-21 16:21     ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 16:21 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>                 dma_fence_put(fence);
>         }
>  
> -       list = dma_resv_get_list(obj);
> -       if (!list)
> -               return;
> -
> -       for (i = 0; i < list->shared_count; ++i) {
> -               fence = rcu_dereference_protected(list->shared[i],
> -                                                 dma_resv_held(obj));
> -
> -               if (!dma_fence_is_signaled(fence))
> -                       continue;
> -
> -               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> -               dma_fence_put(fence);
> -       }
> +       fence = dma_resv_fences_deref(obj, &obj->readers);
> +       if (dma_fence_is_signaled(fence))
> +               dma_resv_fences_set(obj, &obj->readers, NULL);

Needs fence==NULL safeguards.

It's not as pruny as it was before. Or did you add some magic to
fence-array?
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-21 16:21     ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 16:21 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>                 dma_fence_put(fence);
>         }
>  
> -       list = dma_resv_get_list(obj);
> -       if (!list)
> -               return;
> -
> -       for (i = 0; i < list->shared_count; ++i) {
> -               fence = rcu_dereference_protected(list->shared[i],
> -                                                 dma_resv_held(obj));
> -
> -               if (!dma_fence_is_signaled(fence))
> -                       continue;
> -
> -               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> -               dma_fence_put(fence);
> -       }
> +       fence = dma_resv_fences_deref(obj, &obj->readers);
> +       if (dma_fence_is_signaled(fence))
> +               dma_resv_fences_set(obj, &obj->readers, NULL);

Needs fence==NULL safeguards.

It's not as pruny as it was before. Or did you add some magic to
fence-array?
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dma-buf: add dma_fence_array_recycle
  2019-08-21 12:31   ` Christian König
@ 2019-08-21 16:24     ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 16:24 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:40)
> Try to recycle an dma_fence_array object by dropping the last
> reference to it without freeing it.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
>  include/linux/dma-fence-array.h   |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index e82f30551aa1..314cf0e881d9 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
>  }
>  EXPORT_SYMBOL(dma_fence_array_init);
>  
> +/**
> + * dma_fence_array_reuse - dummy for dma_fence_array_recycle
> + */
> +static void dma_fence_array_reuse(struct kref *kref)
> +{
> +       struct dma_fence_array *array = container_of(kref, typeof(*array),
> +                                                    base.refcount);
> +
> +       WARN_ON_ONCE(!list_empty(&array->base.cb_list));

[   77.584694] WARNING: CPU: 3 PID: 627 at drivers/dma-buf/dma-fence-array.c:199 dma_fence_array_recycle+0x1d/0x20
[   77.584702] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul i915 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper crypto_simd cryptd intel_cstate intel_gtt drm_kms_helper intel_uncore intel_rapl_perf ahci i2c_i801 libahci efivars video button efivarfs
[   77.584716] CPU: 3 PID: 627 Comm: gem_busy Tainted: G     U            5.3.0-rc5+ #72
[   77.584719] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[   77.584723] RIP: 0010:dma_fence_array_recycle+0x1d/0x20
[   77.584726] Code: 5d c3 5b 5d e9 54 f5 ff ff 0f 1f 40 00 f0 ff 4f 38 0f 88 d3 85 14 00 0f 94 c0 74 01 c3 48 8b 57 10 48 83 c7 10 48 39 d7 74 f2 <0f> 0b c3 48 c7 c0 62 88 be 81 c3 0f 1f 84 00 00 00 00 00 48 c7 c0
[   77.584730] RSP: 0018:ffffc900001c3c78 EFLAGS: 00010292
[   77.584733] RAX: 0000000000000001 RBX: ffff88885ac14b40 RCX: cccccccccccccccd
[   77.584735] RDX: 00000012105b0a77 RSI: 0000000000000000 RDI: ffff88885ac14b50
[   77.584737] RBP: 0000000000000000 R08: 0000000000000004 R09: 000000000001de00
[   77.584740] R10: 00000030f5224012 R11: 00000000000002df R12: ffff88885b58aff0
[   77.584742] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88885748e100
[   77.584745] FS:  00007f1cd47b19c0(0000) GS:ffff88885eb80000(0000) knlGS:0000000000000000
[   77.584748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   77.584750] CR2: 00007f1cd6719000 CR3: 000000085ac5b002 CR4: 00000000001606e0
[   77.584753] Call Trace:
[   77.584758]  dma_resv_fences_assign.isra.7+0x8d/0x220
[   77.584761]  dma_resv_prune_fences+0xba/0xc0
[   77.584796]  i915_gem_object_wait_reservation+0x226/0x240 [i915]
[   77.584827]  i915_gem_object_wait+0x23/0x40 [i915]
[   77.584854]  i915_gem_set_domain_ioctl+0xbf/0x240 [i915]

If we signal the fence-array, the cb_list is replaced by the timestamp
and is no longer considered empty.
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dma-buf: add dma_fence_array_recycle
@ 2019-08-21 16:24     ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 16:24 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:40)
> Try to recycle an dma_fence_array object by dropping the last
> reference to it without freeing it.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
>  include/linux/dma-fence-array.h   |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index e82f30551aa1..314cf0e881d9 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
>  }
>  EXPORT_SYMBOL(dma_fence_array_init);
>  
> +/**
> + * dma_fence_array_reuse - dummy for dma_fence_array_recycle
> + */
> +static void dma_fence_array_reuse(struct kref *kref)
> +{
> +       struct dma_fence_array *array = container_of(kref, typeof(*array),
> +                                                    base.refcount);
> +
> +       WARN_ON_ONCE(!list_empty(&array->base.cb_list));

[   77.584694] WARNING: CPU: 3 PID: 627 at drivers/dma-buf/dma-fence-array.c:199 dma_fence_array_recycle+0x1d/0x20
[   77.584702] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul i915 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper crypto_simd cryptd intel_cstate intel_gtt drm_kms_helper intel_uncore intel_rapl_perf ahci i2c_i801 libahci efivars video button efivarfs
[   77.584716] CPU: 3 PID: 627 Comm: gem_busy Tainted: G     U            5.3.0-rc5+ #72
[   77.584719] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[   77.584723] RIP: 0010:dma_fence_array_recycle+0x1d/0x20
[   77.584726] Code: 5d c3 5b 5d e9 54 f5 ff ff 0f 1f 40 00 f0 ff 4f 38 0f 88 d3 85 14 00 0f 94 c0 74 01 c3 48 8b 57 10 48 83 c7 10 48 39 d7 74 f2 <0f> 0b c3 48 c7 c0 62 88 be 81 c3 0f 1f 84 00 00 00 00 00 48 c7 c0
[   77.584730] RSP: 0018:ffffc900001c3c78 EFLAGS: 00010292
[   77.584733] RAX: 0000000000000001 RBX: ffff88885ac14b40 RCX: cccccccccccccccd
[   77.584735] RDX: 00000012105b0a77 RSI: 0000000000000000 RDI: ffff88885ac14b50
[   77.584737] RBP: 0000000000000000 R08: 0000000000000004 R09: 000000000001de00
[   77.584740] R10: 00000030f5224012 R11: 00000000000002df R12: ffff88885b58aff0
[   77.584742] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88885748e100
[   77.584745] FS:  00007f1cd47b19c0(0000) GS:ffff88885eb80000(0000) knlGS:0000000000000000
[   77.584748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   77.584750] CR2: 00007f1cd6719000 CR3: 000000085ac5b002 CR4: 00000000001606e0
[   77.584753] Call Trace:
[   77.584758]  dma_resv_fences_assign.isra.7+0x8d/0x220
[   77.584761]  dma_resv_prune_fences+0xba/0xc0
[   77.584796]  i915_gem_object_wait_reservation+0x226/0x240 [i915]
[   77.584827]  i915_gem_object_wait+0x23/0x40 [i915]
[   77.584854]  i915_gem_set_domain_ioctl+0xbf/0x240 [i915]

If we signal the fence-array, the cb_list is replaced by the timestamp
and is no longer considered empty.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 15:24     ` Chris Wilson
@ 2019-08-21 17:35       ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 17:35 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Chris Wilson (2019-08-21 16:24:22)
> Quoting Christian König (2019-08-21 13:31:45)
> > @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
> >                 busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
> >  
> >         /* Translate shared fences to READ set of engines */
> > -       list = rcu_dereference(obj->base.resv->fence);
> > -       if (list) {
> > -               unsigned int shared_count = list->shared_count, i;
> > -
> > -               for (i = 0; i < shared_count; ++i) {
> > -                       struct dma_fence *fence =
> > -                               rcu_dereference(list->shared[i]);
> > -
> > -                       args->busy |= busy_check_reader(fence);
> > -               }
> > -       }
> > +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
> > +       dma_fence_array_for_each(fence, cursor, readers)
> > +               args->busy |= busy_check_reader(fence);
> > +       dma_fence_put(readers);
> 
> That's underwhelming, the full-mb shows up in scaling tests (I'll test
> the impact of this series later). Something like,

To put some numbers to it, adding the full-mb adds 5ns to a single
thread on Kabylake and 20ns under contention.
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-21 17:35       ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 17:35 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Chris Wilson (2019-08-21 16:24:22)
> Quoting Christian König (2019-08-21 13:31:45)
> > @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
> >                 busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
> >  
> >         /* Translate shared fences to READ set of engines */
> > -       list = rcu_dereference(obj->base.resv->fence);
> > -       if (list) {
> > -               unsigned int shared_count = list->shared_count, i;
> > -
> > -               for (i = 0; i < shared_count; ++i) {
> > -                       struct dma_fence *fence =
> > -                               rcu_dereference(list->shared[i]);
> > -
> > -                       args->busy |= busy_check_reader(fence);
> > -               }
> > -       }
> > +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
> > +       dma_fence_array_for_each(fence, cursor, readers)
> > +               args->busy |= busy_check_reader(fence);
> > +       dma_fence_put(readers);
> 
> That's underwhelming, the full-mb shows up in scaling tests (I'll test
> the impact of this series later). Something like,

To put some numbers to it, adding the full-mb adds 5ns to a single
thread on Kabylake and 20ns under contention.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 16:13   ` Daniel Vetter
@ 2019-08-21 20:05     ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 20:05 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

On Wed, Aug 21, 2019 at 06:13:27PM +0200, Daniel Vetter wrote:
> On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
> > Hi everyone,
> > 
> > In previous discussion it surfaced that different drivers use the shared
> > and explicit fences in the dma_resv object with different meanings.
> > 
> > This is problematic when we share buffers between those drivers and
> > requirements for implicit and explicit synchronization leaded to quite a
> > number of workarounds related to this.
> > 
> > So I started an effort to get all drivers back to a common understanding
> > of what the fences in the dma_resv object mean and be able to use the
> > object for different kind of workloads independent of the classic DRM
> > command submission interface.
> > 
> > The result is this patch set which modifies the dma_resv API to get away
> > from a single explicit fence and multiple shared fences, towards a
> > notation where we have explicit categories for writers, readers and
> > others.
> > 
> > To do this I came up with a new container called dma_resv_fences which
> > can store both a single fence as well as multiple fences in a
> > dma_fence_array.
> > 
> > This turned out to actually be even be quite a bit simpler, since we
> > don't need any complicated dance between RCU and sequence count
> > protected updates any more.
> > 
> > Instead we can just grab a reference to the dma_fence_array under RCU
> > and so keep the current state of synchronization alive until we are done
> > with it.
> > 
> > This results in both a small performance improvement since we don't need
> > so many barriers any more, as well as fewer lines of code in the actual
> > implementation.
> 
> I think you traded lack of barriers/retry loops for correctness here, see
> reply later on. But I haven't grokked the full thing in details, so easily
> might have missed something.
> 
> But high level first, and I don't get this at all. Current state:
> 
> Ill defined semantics, no docs. You have to look at the implementations.
> 
> New state after you patch series:
> 
> Ill defined semantics (but hey different!), no docs. You still have to
> look at the implementations to understand what's going on.
> 
> I think what has actually changed (aside from the entire implementation)
> is just these three things:
> - we now allow multiple exclusive fences

This isn't really new, you could just attach a dma_fence_array already to
the exclusive slot. So not really new either.

> - exclusive was renamed to writer fences, shared to reader fences

Bit more context why I think this is a pure bikeshed: We've had (what at
least felt like) a multi-year bikeshed on what to call these, with the two
options writer/readers and exclusive/shared. Somehow (it's not documented,
hooray) we ended up going with exlusive/shared. Switching that over to the
other bikeshed again, still without documenting what exactly you should be
putting there (since amdgpu still doesn't always fill out the writer,
because that's not how amdgpu works), feels really silly.

> - there's a new "other" group, for ... otherwordly fences?

I guess this is to better handle the amdkfd magic fence, or the vm fences?
Still no idea since not used.

One other thing I've found while trying to figure out your motivation here
(since I'm not getting what you're aiming) is that setting the exclusive
fence through the old interface now sets both exclusive and shared fences.

I guess if that's all (I'm assuming I'm blind) we can just add a "give me
all the fences" interface, and use that for the drivers that want that.

> Afaiui we have the following to issues with the current fence semantics:
> - amdgpu came up with a totally different notion of implicit sync, using
>   the owner to figure out when to sync. I have no idea at all how that
>   meshes with multiple writers, but I guess there's a connection.
> - amdkfd does a very fancy eviction/preempt fence. Is that what the other
>   bucket is for?
> 
> I guess I could read the amdgpu/ttm code in very fine detail and figure
> this out, but I really don't see how that's moving stuff forward.
> 
> Also, I think it'd be really good to decouple semantic changes from
> implementation changes, because untangling them if we have to revert one
> or the other is going to be nigh impossible. And dma_* is not really an
> area where we can proudly claim that reverts don't happen.

I think we should go even further with this, and start earlier.

step 1: Document the current semantics.

Once we have that, we can look at the amdkfd and amdgpu vm stuff and
whatever else there is, and figure out what's missing. Maybe even throw in
the exact thing you're doign in amdkfd/gpu into the above documentation,
in an effort to cover what's done. I can add some entertaining things from
i915's side too :-)

And I mean actual real docs that explain stuff, not oneliner kerneldocs
for functions and that's it. Without that I think we'll just move in
circles and go nowhere at all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-21 20:05     ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 20:05 UTC (permalink / raw)
  To: Christian König; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

On Wed, Aug 21, 2019 at 06:13:27PM +0200, Daniel Vetter wrote:
> On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
> > Hi everyone,
> > 
> > In previous discussion it surfaced that different drivers use the shared
> > and explicit fences in the dma_resv object with different meanings.
> > 
> > This is problematic when we share buffers between those drivers and
> > requirements for implicit and explicit synchronization leaded to quite a
> > number of workarounds related to this.
> > 
> > So I started an effort to get all drivers back to a common understanding
> > of what the fences in the dma_resv object mean and be able to use the
> > object for different kind of workloads independent of the classic DRM
> > command submission interface.
> > 
> > The result is this patch set which modifies the dma_resv API to get away
> > from a single explicit fence and multiple shared fences, towards a
> > notation where we have explicit categories for writers, readers and
> > others.
> > 
> > To do this I came up with a new container called dma_resv_fences which
> > can store both a single fence as well as multiple fences in a
> > dma_fence_array.
> > 
> > This turned out to actually be even be quite a bit simpler, since we
> > don't need any complicated dance between RCU and sequence count
> > protected updates any more.
> > 
> > Instead we can just grab a reference to the dma_fence_array under RCU
> > and so keep the current state of synchronization alive until we are done
> > with it.
> > 
> > This results in both a small performance improvement since we don't need
> > so many barriers any more, as well as fewer lines of code in the actual
> > implementation.
> 
> I think you traded lack of barriers/retry loops for correctness here, see
> reply later on. But I haven't grokked the full thing in details, so easily
> might have missed something.
> 
> But high level first, and I don't get this at all. Current state:
> 
> Ill defined semantics, no docs. You have to look at the implementations.
> 
> New state after you patch series:
> 
> Ill defined semantics (but hey different!), no docs. You still have to
> look at the implementations to understand what's going on.
> 
> I think what has actually changed (aside from the entire implementation)
> is just these three things:
> - we now allow multiple exclusive fences

This isn't really new, you could just attach a dma_fence_array already to
the exclusive slot. So not really new either.

> - exclusive was renamed to writer fences, shared to reader fences

Bit more context why I think this is a pure bikeshed: We've had (what at
least felt like) a multi-year bikeshed on what to call these, with the two
options writer/readers and exclusive/shared. Somehow (it's not documented,
hooray) we ended up going with exlusive/shared. Switching that over to the
other bikeshed again, still without documenting what exactly you should be
putting there (since amdgpu still doesn't always fill out the writer,
because that's not how amdgpu works), feels really silly.

> - there's a new "other" group, for ... otherwordly fences?

I guess this is to better handle the amdkfd magic fence, or the vm fences?
Still no idea since not used.

One other thing I've found while trying to figure out your motivation here
(since I'm not getting what you're aiming) is that setting the exclusive
fence through the old interface now sets both exclusive and shared fences.

I guess if that's all (I'm assuming I'm blind) we can just add a "give me
all the fences" interface, and use that for the drivers that want that.

> Afaiui we have the following to issues with the current fence semantics:
> - amdgpu came up with a totally different notion of implicit sync, using
>   the owner to figure out when to sync. I have no idea at all how that
>   meshes with multiple writers, but I guess there's a connection.
> - amdkfd does a very fancy eviction/preempt fence. Is that what the other
>   bucket is for?
> 
> I guess I could read the amdgpu/ttm code in very fine detail and figure
> this out, but I really don't see how that's moving stuff forward.
> 
> Also, I think it'd be really good to decouple semantic changes from
> implementation changes, because untangling them if we have to revert one
> or the other is going to be nigh impossible. And dma_* is not really an
> area where we can proudly claim that reverts don't happen.

I think we should go even further with this, and start earlier.

step 1: Document the current semantics.

Once we have that, we can look at the amdkfd and amdgpu vm stuff and
whatever else there is, and figure out what's missing. Maybe even throw in
the exact thing you're doign in amdkfd/gpu into the above documentation,
in an effort to cover what's done. I can add some entertaining things from
i915's side too :-)

And I mean actual real docs that explain stuff, not oneliner kerneldocs
for functions and that's it. Without that I think we'll just move in
circles and go nowhere at all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 12:31 ` Christian König
@ 2019-08-21 20:11   ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 20:11 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:37)
> Hi everyone,
> 
> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> 
> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> 
> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> 
> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.

Fwiw, I would like the distinction here between optional fences
(writers, readers) and mandatory fences (others). The optional fences
are where we put the implicit fence tracking that clever userspace would
rather avoid. The mandatory fences (I would call internal) is where we
put the fences for tracking migration that userspace can not opt out of.
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-21 20:11   ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-21 20:11 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:37)
> Hi everyone,
> 
> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> 
> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> 
> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> 
> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.

Fwiw, I would like the distinction here between optional fences
(writers, readers) and mandatory fences (others). The optional fences
are where we put the implicit fence tracking that clever userspace would
rather avoid. The mandatory fences (I would call internal) is where we
put the fences for tracking migration that userspace can not opt out of.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 20:11   ` Chris Wilson
@ 2019-08-21 20:22     ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 20:22 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Christian König, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK,
	open list:DMA BUFFER SHARING FRAMEWORK, Sumit Semwal

On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Christian König (2019-08-21 13:31:37)
> > Hi everyone,
> >
> > In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> >
> > This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> >
> > So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> >
> > The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
>
> Fwiw, I would like the distinction here between optional fences
> (writers, readers) and mandatory fences (others). The optional fences
> are where we put the implicit fence tracking that clever userspace would
> rather avoid. The mandatory fences (I would call internal) is where we
> put the fences for tracking migration that userspace can not opt out of.

I think this would make sense, and is kinda what I expected here. If
(and I think that's a huge if) we can agree on what those internal
fences are. There's a huge difference between internal fences for
buffer moves (better not ignore those) and internal fences like
amdkfd's eviction fence (better ignore those). So whatever we do add,
it better come with really clear docs and pretty diagrams about what
it's supposed to do, and how it's supposed to be used. Or we're just
back to the current mess we're in, times two.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-21 20:22     ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-21 20:22 UTC (permalink / raw)
  To: Chris Wilson
  Cc: moderated list:DMA BUFFER SHARING FRAMEWORK,
	Christian König, dri-devel,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Christian König (2019-08-21 13:31:37)
> > Hi everyone,
> >
> > In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> >
> > This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> >
> > So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> >
> > The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
>
> Fwiw, I would like the distinction here between optional fences
> (writers, readers) and mandatory fences (others). The optional fences
> are where we put the implicit fence tracking that clever userspace would
> rather avoid. The mandatory fences (I would call internal) is where we
> put the fences for tracking migration that userspace can not opt out of.

I think this would make sense, and is kinda what I expected here. If
(and I think that's a huge if) we can agree on what those internal
fences are. There's a huge difference between internal fences for
buffer moves (better not ignore those) and internal fences like
amdkfd's eviction fence (better ignore those). So whatever we do add,
it better come with really clear docs and pretty diagrams about what
it's supposed to do, and how it's supposed to be used. Or we're just
back to the current mess we're in, times two.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
  2019-08-21 16:04     ` Daniel Vetter
@ 2019-08-22  8:23       ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Am 21.08.19 um 18:04 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
>> [SNIP]
>> +	/* Try to drop the last reference */
>> +	if (!dma_fence_array_recycle(staged))
> Without an rcu barrier here you're not syncing to new clients at all.
> I don't think this works, and I expect that once you've readded all the
> barriers and retry loops we're back to seqlocks.

The key difference is that RCU users now use dma_fence_get_rcu_safe() to 
grab a reference to the current set of fences.

In other words the whole array is reference counted and RCU protected 
instead of each individual entry in the array.

This way you don't need the sequence count any more because you grab a 
reference to all of them at once and then can be sure that they don't 
change.

Regards,
Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
@ 2019-08-22  8:23       ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

Am 21.08.19 um 18:04 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
>> [SNIP]
>> +	/* Try to drop the last reference */
>> +	if (!dma_fence_array_recycle(staged))
> Without an rcu barrier here you're not syncing to new clients at all.
> I don't think this works, and I expect that once you've readded all the
> barriers and retry loops we're back to seqlocks.

The key difference is that RCU users now use dma_fence_get_rcu_safe() to 
grab a reference to the current set of fences.

In other words the whole array is reference counted and RCU protected 
instead of each individual entry in the array.

This way you don't need the sequence count any more because you grab a 
reference to all of them at once and then can be sure that they don't 
change.

Regards,
Christian.

> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 17:35       ` Chris Wilson
@ 2019-08-22  8:37         ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:37 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 21.08.19 um 19:35 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-21 16:24:22)
>> Quoting Christian König (2019-08-21 13:31:45)
>>> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>>                  busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>>>   
>>>          /* Translate shared fences to READ set of engines */
>>> -       list = rcu_dereference(obj->base.resv->fence);
>>> -       if (list) {
>>> -               unsigned int shared_count = list->shared_count, i;
>>> -
>>> -               for (i = 0; i < shared_count; ++i) {
>>> -                       struct dma_fence *fence =
>>> -                               rcu_dereference(list->shared[i]);
>>> -
>>> -                       args->busy |= busy_check_reader(fence);
>>> -               }
>>> -       }
>>> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
>>> +       dma_fence_array_for_each(fence, cursor, readers)
>>> +               args->busy |= busy_check_reader(fence);
>>> +       dma_fence_put(readers);
>> That's underwhelming, the full-mb shows up in scaling tests (I'll test
>> the impact of this series later). Something like,
> To put some numbers to it, adding the full-mb adds 5ns to a single
> thread on Kabylake and 20ns under contention.

The question is if that's the use case we want to optimize for.

Querying a buffer for business is something we do absolutely rarely on 
amdgpu, e.g. IIRC we even grab the full reservation lock for this.

But adding new fences comes with every command submission.

What could maybe work is the "do { } while (fence has changed); loop you 
suggested earlier in this mail thread, but I need to double check if 
that would really work or clash with recycling dma_fence_arrays().

Christian.

> -Chris


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-22  8:37         ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:37 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 21.08.19 um 19:35 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-21 16:24:22)
>> Quoting Christian König (2019-08-21 13:31:45)
>>> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>>                  busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>>>   
>>>          /* Translate shared fences to READ set of engines */
>>> -       list = rcu_dereference(obj->base.resv->fence);
>>> -       if (list) {
>>> -               unsigned int shared_count = list->shared_count, i;
>>> -
>>> -               for (i = 0; i < shared_count; ++i) {
>>> -                       struct dma_fence *fence =
>>> -                               rcu_dereference(list->shared[i]);
>>> -
>>> -                       args->busy |= busy_check_reader(fence);
>>> -               }
>>> -       }
>>> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
>>> +       dma_fence_array_for_each(fence, cursor, readers)
>>> +               args->busy |= busy_check_reader(fence);
>>> +       dma_fence_put(readers);
>> That's underwhelming, the full-mb shows up in scaling tests (I'll test
>> the impact of this series later). Something like,
> To put some numbers to it, adding the full-mb adds 5ns to a single
> thread on Kabylake and 20ns under contention.

The question is if that's the use case we want to optimize for.

Querying a buffer for business is something we do absolutely rarely on 
amdgpu, e.g. IIRC we even grab the full reservation lock for this.

But adding new fences comes with every command submission.

What could maybe work is the "do { } while (fence has changed); loop you 
suggested earlier in this mail thread, but I need to double check if 
that would really work or clash with recycling dma_fence_arrays().

Christian.

> -Chris

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dma-buf: add dma_fence_array_recycle
  2019-08-21 16:24     ` Chris Wilson
@ 2019-08-22  8:38       ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:38 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 21.08.19 um 18:24 schrieb Chris Wilson:
> Quoting Christian König (2019-08-21 13:31:40)
>> Try to recycle an dma_fence_array object by dropping the last
>> reference to it without freeing it.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
>>   include/linux/dma-fence-array.h   |  1 +
>>   2 files changed, 28 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
>> index e82f30551aa1..314cf0e881d9 100644
>> --- a/drivers/dma-buf/dma-fence-array.c
>> +++ b/drivers/dma-buf/dma-fence-array.c
>> @@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
>>   }
>>   EXPORT_SYMBOL(dma_fence_array_init);
>>   
>> +/**
>> + * dma_fence_array_reuse - dummy for dma_fence_array_recycle
>> + */
>> +static void dma_fence_array_reuse(struct kref *kref)
>> +{
>> +       struct dma_fence_array *array = container_of(kref, typeof(*array),
>> +                                                    base.refcount);
>> +
>> +       WARN_ON_ONCE(!list_empty(&array->base.cb_list));
> [   77.584694] WARNING: CPU: 3 PID: 627 at drivers/dma-buf/dma-fence-array.c:199 dma_fence_array_recycle+0x1d/0x20
> [   77.584702] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul i915 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper crypto_simd cryptd intel_cstate intel_gtt drm_kms_helper intel_uncore intel_rapl_perf ahci i2c_i801 libahci efivars video button efivarfs
> [   77.584716] CPU: 3 PID: 627 Comm: gem_busy Tainted: G     U            5.3.0-rc5+ #72
> [   77.584719] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
> [   77.584723] RIP: 0010:dma_fence_array_recycle+0x1d/0x20
> [   77.584726] Code: 5d c3 5b 5d e9 54 f5 ff ff 0f 1f 40 00 f0 ff 4f 38 0f 88 d3 85 14 00 0f 94 c0 74 01 c3 48 8b 57 10 48 83 c7 10 48 39 d7 74 f2 <0f> 0b c3 48 c7 c0 62 88 be 81 c3 0f 1f 84 00 00 00 00 00 48 c7 c0
> [   77.584730] RSP: 0018:ffffc900001c3c78 EFLAGS: 00010292
> [   77.584733] RAX: 0000000000000001 RBX: ffff88885ac14b40 RCX: cccccccccccccccd
> [   77.584735] RDX: 00000012105b0a77 RSI: 0000000000000000 RDI: ffff88885ac14b50
> [   77.584737] RBP: 0000000000000000 R08: 0000000000000004 R09: 000000000001de00
> [   77.584740] R10: 00000030f5224012 R11: 00000000000002df R12: ffff88885b58aff0
> [   77.584742] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88885748e100
> [   77.584745] FS:  00007f1cd47b19c0(0000) GS:ffff88885eb80000(0000) knlGS:0000000000000000
> [   77.584748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   77.584750] CR2: 00007f1cd6719000 CR3: 000000085ac5b002 CR4: 00000000001606e0
> [   77.584753] Call Trace:
> [   77.584758]  dma_resv_fences_assign.isra.7+0x8d/0x220
> [   77.584761]  dma_resv_prune_fences+0xba/0xc0
> [   77.584796]  i915_gem_object_wait_reservation+0x226/0x240 [i915]
> [   77.584827]  i915_gem_object_wait+0x23/0x40 [i915]
> [   77.584854]  i915_gem_set_domain_ioctl+0xbf/0x240 [i915]
>
> If we signal the fence-array, the cb_list is replaced by the timestamp
> and is no longer considered empty.

Ah, yeah good point I actually wrote that before cb_list and timestamp 
shared the same memory.

Christian.

> -Chris


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/10] dma-buf: add dma_fence_array_recycle
@ 2019-08-22  8:38       ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  8:38 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 21.08.19 um 18:24 schrieb Chris Wilson:
> Quoting Christian König (2019-08-21 13:31:40)
>> Try to recycle an dma_fence_array object by dropping the last
>> reference to it without freeing it.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/dma-fence-array.c | 27 +++++++++++++++++++++++++++
>>   include/linux/dma-fence-array.h   |  1 +
>>   2 files changed, 28 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
>> index e82f30551aa1..314cf0e881d9 100644
>> --- a/drivers/dma-buf/dma-fence-array.c
>> +++ b/drivers/dma-buf/dma-fence-array.c
>> @@ -188,6 +188,33 @@ void dma_fence_array_init(struct dma_fence_array *array, u64 context,
>>   }
>>   EXPORT_SYMBOL(dma_fence_array_init);
>>   
>> +/**
>> + * dma_fence_array_reuse - dummy for dma_fence_array_recycle
>> + */
>> +static void dma_fence_array_reuse(struct kref *kref)
>> +{
>> +       struct dma_fence_array *array = container_of(kref, typeof(*array),
>> +                                                    base.refcount);
>> +
>> +       WARN_ON_ONCE(!list_empty(&array->base.cb_list));
> [   77.584694] WARNING: CPU: 3 PID: 627 at drivers/dma-buf/dma-fence-array.c:199 dma_fence_array_recycle+0x1d/0x20
> [   77.584702] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul i915 crc32_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper crypto_simd cryptd intel_cstate intel_gtt drm_kms_helper intel_uncore intel_rapl_perf ahci i2c_i801 libahci efivars video button efivarfs
> [   77.584716] CPU: 3 PID: 627 Comm: gem_busy Tainted: G     U            5.3.0-rc5+ #72
> [   77.584719] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
> [   77.584723] RIP: 0010:dma_fence_array_recycle+0x1d/0x20
> [   77.584726] Code: 5d c3 5b 5d e9 54 f5 ff ff 0f 1f 40 00 f0 ff 4f 38 0f 88 d3 85 14 00 0f 94 c0 74 01 c3 48 8b 57 10 48 83 c7 10 48 39 d7 74 f2 <0f> 0b c3 48 c7 c0 62 88 be 81 c3 0f 1f 84 00 00 00 00 00 48 c7 c0
> [   77.584730] RSP: 0018:ffffc900001c3c78 EFLAGS: 00010292
> [   77.584733] RAX: 0000000000000001 RBX: ffff88885ac14b40 RCX: cccccccccccccccd
> [   77.584735] RDX: 00000012105b0a77 RSI: 0000000000000000 RDI: ffff88885ac14b50
> [   77.584737] RBP: 0000000000000000 R08: 0000000000000004 R09: 000000000001de00
> [   77.584740] R10: 00000030f5224012 R11: 00000000000002df R12: ffff88885b58aff0
> [   77.584742] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88885748e100
> [   77.584745] FS:  00007f1cd47b19c0(0000) GS:ffff88885eb80000(0000) knlGS:0000000000000000
> [   77.584748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   77.584750] CR2: 00007f1cd6719000 CR3: 000000085ac5b002 CR4: 00000000001606e0
> [   77.584753] Call Trace:
> [   77.584758]  dma_resv_fences_assign.isra.7+0x8d/0x220
> [   77.584761]  dma_resv_prune_fences+0xba/0xc0
> [   77.584796]  i915_gem_object_wait_reservation+0x226/0x240 [i915]
> [   77.584827]  i915_gem_object_wait+0x23/0x40 [i915]
> [   77.584854]  i915_gem_set_domain_ioctl+0xbf/0x240 [i915]
>
> If we signal the fence-array, the cb_list is replaced by the timestamp
> and is no longer considered empty.

Ah, yeah good point I actually wrote that before cb_list and timestamp 
shared the same memory.

Christian.

> -Chris

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 20:22     ` Daniel Vetter
@ 2019-08-22  9:14       ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:14 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson
  Cc: dri-devel, moderated list:DMA BUFFER SHARING FRAMEWORK,
	open list:DMA BUFFER SHARING FRAMEWORK, Sumit Semwal

Am 21.08.19 um 22:22 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> Quoting Christian König (2019-08-21 13:31:37)
>>> Hi everyone,
>>>
>>> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
>>>
>>> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
>>>
>>> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
>>>
>>> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
>> Fwiw, I would like the distinction here between optional fences
>> (writers, readers) and mandatory fences (others). The optional fences
>> are where we put the implicit fence tracking that clever userspace would
>> rather avoid. The mandatory fences (I would call internal) is where we
>> put the fences for tracking migration that userspace can not opt out of.
> I think this would make sense, and is kinda what I expected here.

Yeah, exactly that's the intention here.

Basic idea is to group the fences into the categories of "you always 
need to wait for when doing implicit synchronization" (writers), "you 
only need to wait for them when you want to write to the object" 
(readers) and "ignore them for implicit synchronization".

> If (and I think that's a huge if) we can agree on what those internal
> fences are. There's a huge difference between internal fences for
> buffer moves (better not ignore those) and internal fences like
> amdkfd's eviction fence (better ignore those).

Yeah, that's exactly why I want to get away from those exclusive/shared 
naming.

For readers/writers I hoped the semantic would be more clear, but that's 
doesn't seems to be the case.

> So whatever we do add, it better come with really clear docs and pretty diagrams about what
> it's supposed to do, and how it's supposed to be used. Or we're just
> back to the current mess we're in, times two.

Well documenting it in the end is clearly a good idea, but I don't think 
we should start with that before we actually know what we want to 
implement and how we want to implement it.

Otherwise I would write tons of documentation which can be thrown away 
again in the end because we decided to don't do it this way.

Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-22  9:14       ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:14 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson
  Cc: moderated list:DMA BUFFER SHARING FRAMEWORK, dri-devel,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 21.08.19 um 22:22 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> Quoting Christian König (2019-08-21 13:31:37)
>>> Hi everyone,
>>>
>>> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
>>>
>>> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
>>>
>>> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
>>>
>>> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
>> Fwiw, I would like the distinction here between optional fences
>> (writers, readers) and mandatory fences (others). The optional fences
>> are where we put the implicit fence tracking that clever userspace would
>> rather avoid. The mandatory fences (I would call internal) is where we
>> put the fences for tracking migration that userspace can not opt out of.
> I think this would make sense, and is kinda what I expected here.

Yeah, exactly that's the intention here.

Basic idea is to group the fences into the categories of "you always 
need to wait for when doing implicit synchronization" (writers), "you 
only need to wait for them when you want to write to the object" 
(readers) and "ignore them for implicit synchronization".

> If (and I think that's a huge if) we can agree on what those internal
> fences are. There's a huge difference between internal fences for
> buffer moves (better not ignore those) and internal fences like
> amdkfd's eviction fence (better ignore those).

Yeah, that's exactly why I want to get away from those exclusive/shared 
naming.

For readers/writers I hoped the semantic would be more clear, but that's 
doesn't seems to be the case.

> So whatever we do add, it better come with really clear docs and pretty diagrams about what
> it's supposed to do, and how it's supposed to be used. Or we're just
> back to the current mess we're in, times two.

Well documenting it in the end is clearly a good idea, but I don't think 
we should start with that before we actually know what we want to 
implement and how we want to implement it.

Otherwise I would write tons of documentation which can be thrown away 
again in the end because we decided to don't do it this way.

Christian.

> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-22  8:37         ` Christian König
@ 2019-08-22  9:16           ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:16 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 22.08.19 um 10:37 schrieb Christian König:
> Am 21.08.19 um 19:35 schrieb Chris Wilson:
>> Quoting Chris Wilson (2019-08-21 16:24:22)
>>> Quoting Christian König (2019-08-21 13:31:45)
>>>> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, 
>>>> void *data,
>>>> busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>>>>            /* Translate shared fences to READ set of engines */
>>>> -       list = rcu_dereference(obj->base.resv->fence);
>>>> -       if (list) {
>>>> -               unsigned int shared_count = list->shared_count, i;
>>>> -
>>>> -               for (i = 0; i < shared_count; ++i) {
>>>> -                       struct dma_fence *fence =
>>>> - rcu_dereference(list->shared[i]);
>>>> -
>>>> -                       args->busy |= busy_check_reader(fence);
>>>> -               }
>>>> -       }
>>>> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
>>>> +       dma_fence_array_for_each(fence, cursor, readers)
>>>> +               args->busy |= busy_check_reader(fence);
>>>> +       dma_fence_put(readers);
>>> That's underwhelming, the full-mb shows up in scaling tests (I'll test
>>> the impact of this series later). Something like,
>> To put some numbers to it, adding the full-mb adds 5ns to a single
>> thread on Kabylake and 20ns under contention.
>
> The question is if that's the use case we want to optimize for.
>
> Querying a buffer for business is something we do absolutely rarely on 
> amdgpu, e.g. IIRC we even grab the full reservation lock for this.
>
> But adding new fences comes with every command submission.
>
> What could maybe work is the "do { } while (fence has changed); loop 
> you suggested earlier in this mail thread, but I need to double check 
> if that would really work or clash with recycling dma_fence_arrays().

Ok thinking about it some more that won't work either.

See the dma_fence_array is only guaranteed to not change when you hold a 
reference to it. Otherwise we don't guarantee anything.

Christian.

>
> Christian.
>
>> -Chris
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-22  9:16           ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:16 UTC (permalink / raw)
  To: Chris Wilson, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Am 22.08.19 um 10:37 schrieb Christian König:
> Am 21.08.19 um 19:35 schrieb Chris Wilson:
>> Quoting Chris Wilson (2019-08-21 16:24:22)
>>> Quoting Christian König (2019-08-21 13:31:45)
>>>> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, 
>>>> void *data,
>>>> busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>>>>            /* Translate shared fences to READ set of engines */
>>>> -       list = rcu_dereference(obj->base.resv->fence);
>>>> -       if (list) {
>>>> -               unsigned int shared_count = list->shared_count, i;
>>>> -
>>>> -               for (i = 0; i < shared_count; ++i) {
>>>> -                       struct dma_fence *fence =
>>>> - rcu_dereference(list->shared[i]);
>>>> -
>>>> -                       args->busy |= busy_check_reader(fence);
>>>> -               }
>>>> -       }
>>>> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
>>>> +       dma_fence_array_for_each(fence, cursor, readers)
>>>> +               args->busy |= busy_check_reader(fence);
>>>> +       dma_fence_put(readers);
>>> That's underwhelming, the full-mb shows up in scaling tests (I'll test
>>> the impact of this series later). Something like,
>> To put some numbers to it, adding the full-mb adds 5ns to a single
>> thread on Kabylake and 20ns under contention.
>
> The question is if that's the use case we want to optimize for.
>
> Querying a buffer for business is something we do absolutely rarely on 
> amdgpu, e.g. IIRC we even grab the full reservation lock for this.
>
> But adding new fences comes with every command submission.
>
> What could maybe work is the "do { } while (fence has changed); loop 
> you suggested earlier in this mail thread, but I need to double check 
> if that would really work or clash with recycling dma_fence_arrays().

Ok thinking about it some more that won't work either.

See the dma_fence_array is only guaranteed to not change when you hold a 
reference to it. Otherwise we don't guarantee anything.

Christian.

>
> Christian.
>
>> -Chris
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-21 20:05     ` Daniel Vetter
@ 2019-08-22  9:27       ` Christian König
  -1 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Am 21.08.19 um 22:05 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 06:13:27PM +0200, Daniel Vetter wrote:
>> On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
>>> Hi everyone,
>>>
>>> In previous discussion it surfaced that different drivers use the shared
>>> and explicit fences in the dma_resv object with different meanings.
>>>
>>> This is problematic when we share buffers between those drivers and
>>> requirements for implicit and explicit synchronization leaded to quite a
>>> number of workarounds related to this.
>>>
>>> So I started an effort to get all drivers back to a common understanding
>>> of what the fences in the dma_resv object mean and be able to use the
>>> object for different kind of workloads independent of the classic DRM
>>> command submission interface.
>>>
>>> The result is this patch set which modifies the dma_resv API to get away
>>> from a single explicit fence and multiple shared fences, towards a
>>> notation where we have explicit categories for writers, readers and
>>> others.
>>>
>>> To do this I came up with a new container called dma_resv_fences which
>>> can store both a single fence as well as multiple fences in a
>>> dma_fence_array.
>>>
>>> This turned out to actually be even be quite a bit simpler, since we
>>> don't need any complicated dance between RCU and sequence count
>>> protected updates any more.
>>>
>>> Instead we can just grab a reference to the dma_fence_array under RCU
>>> and so keep the current state of synchronization alive until we are done
>>> with it.
>>>
>>> This results in both a small performance improvement since we don't need
>>> so many barriers any more, as well as fewer lines of code in the actual
>>> implementation.
>> I think you traded lack of barriers/retry loops for correctness here, see
>> reply later on. But I haven't grokked the full thing in details, so easily
>> might have missed something.
>>
>> But high level first, and I don't get this at all. Current state:
>>
>> Ill defined semantics, no docs. You have to look at the implementations.
>>
>> New state after you patch series:
>>
>> Ill defined semantics (but hey different!), no docs. You still have to
>> look at the implementations to understand what's going on.
>>
>> I think what has actually changed (aside from the entire implementation)
>> is just these three things:
>> - we now allow multiple exclusive fences
> This isn't really new, you could just attach a dma_fence_array already to
> the exclusive slot. So not really new either.

Correct, the problem is really that in this case we still wouldn't have 
a clear semantic what means which.

>> - exclusive was renamed to writer fences, shared to reader fences
> Bit more context why I think this is a pure bikeshed: We've had (what at
> least felt like) a multi-year bikeshed on what to call these, with the two
> options writer/readers and exclusive/shared. Somehow (it's not documented,
> hooray) we ended up going with exlusive/shared. Switching that over to the
> other bikeshed again, still without documenting what exactly you should be
> putting there (since amdgpu still doesn't always fill out the writer,
> because that's not how amdgpu works), feels really silly.

I simple haven't change the implementation in amdgpu because I wanted to 
negotiated what we are actually going to do first.

>> - there's a new "other" group, for ... otherwordly fences?
> I guess this is to better handle the amdkfd magic fence, or the vm fences?

Both, this is simply for fences which doesn't participate in implicit 
synchronization at all.

> Still no idea since not used.
>
> One other thing I've found while trying to figure out your motivation here
> (since I'm not getting what you're aiming) is that setting the exclusive
> fence through the old interface now sets both exclusive and shared fences.
>
> I guess if that's all (I'm assuming I'm blind) we can just add a "give me
> all the fences" interface, and use that for the drivers that want that.
>
>> Afaiui we have the following to issues with the current fence semantics:
>> - amdgpu came up with a totally different notion of implicit sync, using
>>    the owner to figure out when to sync. I have no idea at all how that
>>    meshes with multiple writers, but I guess there's a connection.
>> - amdkfd does a very fancy eviction/preempt fence. Is that what the other
>>    bucket is for?
>>
>> I guess I could read the amdgpu/ttm code in very fine detail and figure
>> this out, but I really don't see how that's moving stuff forward.
>>
>> Also, I think it'd be really good to decouple semantic changes from
>> implementation changes, because untangling them if we have to revert one
>> or the other is going to be nigh impossible. And dma_* is not really an
>> area where we can proudly claim that reverts don't happen.
> I think we should go even further with this, and start earlier.
>
> step 1: Document the current semantics.

I don't think that this is a good idea, because we don't have a clear 
current semantics.

What we have is a container with fences and no definition what those 
fences mean.

We would just spend a lot of time and documenting that we messed it up 
with no gain at all.

The aim of this patch set is to:
a) replace the current container with something which can be re-used 
multiple times.
b) actually define what the fences in the container actually mean.

I mixed those two goals up in a single patch and you are absolutely 
correct that this wasn't a good idea, going to fix that for the next 
iteration.

Maybe it becomes clearer then what I try to do here,
Christian.

> Once we have that, we can look at the amdkfd and amdgpu vm stuff and
> whatever else there is, and figure out what's missing. Maybe even throw in
> the exact thing you're doign in amdkfd/gpu into the above documentation,
> in an effort to cover what's done. I can add some entertaining things from
> i915's side too :-)
>
> And I mean actual real docs that explain stuff, not oneliner kerneldocs
> for functions and that's it. Without that I think we'll just move in
> circles and go nowhere at all.
> -Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-22  9:27       ` Christian König
  0 siblings, 0 replies; 66+ messages in thread
From: Christian König @ 2019-08-22  9:27 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

Am 21.08.19 um 22:05 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 06:13:27PM +0200, Daniel Vetter wrote:
>> On Wed, Aug 21, 2019 at 02:31:37PM +0200, Christian König wrote:
>>> Hi everyone,
>>>
>>> In previous discussion it surfaced that different drivers use the shared
>>> and explicit fences in the dma_resv object with different meanings.
>>>
>>> This is problematic when we share buffers between those drivers and
>>> requirements for implicit and explicit synchronization leaded to quite a
>>> number of workarounds related to this.
>>>
>>> So I started an effort to get all drivers back to a common understanding
>>> of what the fences in the dma_resv object mean and be able to use the
>>> object for different kind of workloads independent of the classic DRM
>>> command submission interface.
>>>
>>> The result is this patch set which modifies the dma_resv API to get away
>>> from a single explicit fence and multiple shared fences, towards a
>>> notation where we have explicit categories for writers, readers and
>>> others.
>>>
>>> To do this I came up with a new container called dma_resv_fences which
>>> can store both a single fence as well as multiple fences in a
>>> dma_fence_array.
>>>
>>> This turned out to actually be even be quite a bit simpler, since we
>>> don't need any complicated dance between RCU and sequence count
>>> protected updates any more.
>>>
>>> Instead we can just grab a reference to the dma_fence_array under RCU
>>> and so keep the current state of synchronization alive until we are done
>>> with it.
>>>
>>> This results in both a small performance improvement since we don't need
>>> so many barriers any more, as well as fewer lines of code in the actual
>>> implementation.
>> I think you traded lack of barriers/retry loops for correctness here, see
>> reply later on. But I haven't grokked the full thing in details, so easily
>> might have missed something.
>>
>> But high level first, and I don't get this at all. Current state:
>>
>> Ill defined semantics, no docs. You have to look at the implementations.
>>
>> New state after you patch series:
>>
>> Ill defined semantics (but hey different!), no docs. You still have to
>> look at the implementations to understand what's going on.
>>
>> I think what has actually changed (aside from the entire implementation)
>> is just these three things:
>> - we now allow multiple exclusive fences
> This isn't really new, you could just attach a dma_fence_array already to
> the exclusive slot. So not really new either.

Correct, the problem is really that in this case we still wouldn't have 
a clear semantic what means which.

>> - exclusive was renamed to writer fences, shared to reader fences
> Bit more context why I think this is a pure bikeshed: We've had (what at
> least felt like) a multi-year bikeshed on what to call these, with the two
> options writer/readers and exclusive/shared. Somehow (it's not documented,
> hooray) we ended up going with exlusive/shared. Switching that over to the
> other bikeshed again, still without documenting what exactly you should be
> putting there (since amdgpu still doesn't always fill out the writer,
> because that's not how amdgpu works), feels really silly.

I simple haven't change the implementation in amdgpu because I wanted to 
negotiated what we are actually going to do first.

>> - there's a new "other" group, for ... otherwordly fences?
> I guess this is to better handle the amdkfd magic fence, or the vm fences?

Both, this is simply for fences which doesn't participate in implicit 
synchronization at all.

> Still no idea since not used.
>
> One other thing I've found while trying to figure out your motivation here
> (since I'm not getting what you're aiming) is that setting the exclusive
> fence through the old interface now sets both exclusive and shared fences.
>
> I guess if that's all (I'm assuming I'm blind) we can just add a "give me
> all the fences" interface, and use that for the drivers that want that.
>
>> Afaiui we have the following to issues with the current fence semantics:
>> - amdgpu came up with a totally different notion of implicit sync, using
>>    the owner to figure out when to sync. I have no idea at all how that
>>    meshes with multiple writers, but I guess there's a connection.
>> - amdkfd does a very fancy eviction/preempt fence. Is that what the other
>>    bucket is for?
>>
>> I guess I could read the amdgpu/ttm code in very fine detail and figure
>> this out, but I really don't see how that's moving stuff forward.
>>
>> Also, I think it'd be really good to decouple semantic changes from
>> implementation changes, because untangling them if we have to revert one
>> or the other is going to be nigh impossible. And dma_* is not really an
>> area where we can proudly claim that reverts don't happen.
> I think we should go even further with this, and start earlier.
>
> step 1: Document the current semantics.

I don't think that this is a good idea, because we don't have a clear 
current semantics.

What we have is a container with fences and no definition what those 
fences mean.

We would just spend a lot of time and documenting that we messed it up 
with no gain at all.

The aim of this patch set is to:
a) replace the current container with something which can be re-used 
multiple times.
b) actually define what the fences in the container actually mean.

I mixed those two goals up in a single patch and you are absolutely 
correct that this wasn't a good idea, going to fix that for the next 
iteration.

Maybe it becomes clearer then what I try to do here,
Christian.

> Once we have that, we can look at the amdkfd and amdgpu vm stuff and
> whatever else there is, and figure out what's missing. Maybe even throw in
> the exact thing you're doign in amdkfd/gpu into the above documentation,
> in an effort to cover what's done. I can add some entertaining things from
> i915's side too :-)
>
> And I mean actual real docs that explain stuff, not oneliner kerneldocs
> for functions and that's it. Without that I think we'll just move in
> circles and go nowhere at all.
> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
  2019-08-22  9:14       ` Christian König
@ 2019-08-22 10:00         ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-22 10:00 UTC (permalink / raw)
  To: Christian König
  Cc: Chris Wilson, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK,
	open list:DMA BUFFER SHARING FRAMEWORK, Sumit Semwal

On Thu, Aug 22, 2019 at 11:14 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 21.08.19 um 22:22 schrieb Daniel Vetter:
> > On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> Quoting Christian König (2019-08-21 13:31:37)
> >>> Hi everyone,
> >>>
> >>> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> >>>
> >>> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> >>>
> >>> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> >>>
> >>> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
> >> Fwiw, I would like the distinction here between optional fences
> >> (writers, readers) and mandatory fences (others). The optional fences
> >> are where we put the implicit fence tracking that clever userspace would
> >> rather avoid. The mandatory fences (I would call internal) is where we
> >> put the fences for tracking migration that userspace can not opt out of.
> > I think this would make sense, and is kinda what I expected here.
>
> Yeah, exactly that's the intention here.
>
> Basic idea is to group the fences into the categories of "you always
> need to wait for when doing implicit synchronization" (writers), "you
> only need to wait for them when you want to write to the object"
> (readers) and "ignore them for implicit synchronization".
>
> > If (and I think that's a huge if) we can agree on what those internal
> > fences are. There's a huge difference between internal fences for
> > buffer moves (better not ignore those) and internal fences like
> > amdkfd's eviction fence (better ignore those).
>
> Yeah, that's exactly why I want to get away from those exclusive/shared
> naming.

The bikeshed was epic. The idea behind exclusive/shared was that you
might want to ignore writers (like amdgpu does for internal buffers),
so shared doesn't necessarily mean it only contains readers, there
might also be writers in there. But only writers who are coordinating
their writes through some other means.

For writers the reason with going with exclusive was again the above,
that you might not want to put all writers into the exclusive slot
(amdgpu doesn't, at least for internal stuff). Also, some exclusive
fences might not be traditional writers, but other stuff like bo
moves.

But clearly amdkfd_eviction fence doesn't fit into this scheme. And on
the other hand we might want to have better rules to differentiate
between writers/reads for implicit sync and stuff the kernel does
more. Currently the rules are that you always have to sync with the
exclusive fence, since you have no idea why exactly it is exclusive -
it could be implicit sync, or it could be a bo move, or something else
entirely. At least for foreing fences.

For your own fences I don't think any of this matters, and essentially
you can treat them all as just an overall list of fences on your bo.
E.g. you could also treat the exlusive fence slot as a shared fence
slot for internal purposes, if the driver-internal semantics allow
that.

> For readers/writers I hoped the semantic would be more clear, but that's
> doesn't seems to be the case.
>
> > So whatever we do add, it better come with really clear docs and pretty diagrams about what
> > it's supposed to do, and how it's supposed to be used. Or we're just
> > back to the current mess we're in, times two.
>
> Well documenting it in the end is clearly a good idea, but I don't think
> we should start with that before we actually know what we want to
> implement and how we want to implement it.
>
> Otherwise I would write tons of documentation which can be thrown away
> again in the end because we decided to don't do it this way.

Yeah there's a bit a problem there. So for your RFC I guess the main
goal is the "other" bucket, which is both going to be a huge bikeshed
(not again ...) but also real tricky to figure out the semantics. If
the goal with "other" is to use that for bo moves, and not for
amdkfd_eviction (that one I feel like doesn't fit, at least amdgpu
code tends to ignore it in many cases), then maybe we should call it
kernel_exclusive? Or something else that indicates it's a level above
the normal exclusive, kinda usual user/kernel split. And if you ignore
the kernel_exclusive fence then the kernel will break (and not just
some data corruption visible to userspace).

Or is there something else you want to clear up? I mean aside from the
reader/writer vs exlusive/shared thing, imo that's easier to decided
once we know what "other" is exactly like.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC] replacing dma_resv API
@ 2019-08-22 10:00         ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-22 10:00 UTC (permalink / raw)
  To: Christian König
  Cc: moderated list:DMA BUFFER SHARING FRAMEWORK,
	open list:DMA BUFFER SHARING FRAMEWORK, dri-devel

On Thu, Aug 22, 2019 at 11:14 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 21.08.19 um 22:22 schrieb Daniel Vetter:
> > On Wed, Aug 21, 2019 at 10:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> Quoting Christian König (2019-08-21 13:31:37)
> >>> Hi everyone,
> >>>
> >>> In previous discussion it surfaced that different drivers use the shared and explicit fences in the dma_resv object with different meanings.
> >>>
> >>> This is problematic when we share buffers between those drivers and requirements for implicit and explicit synchronization leaded to quite a number of workarounds related to this.
> >>>
> >>> So I started an effort to get all drivers back to a common understanding of what the fences in the dma_resv object mean and be able to use the object for different kind of workloads independent of the classic DRM command submission interface.
> >>>
> >>> The result is this patch set which modifies the dma_resv API to get away from a single explicit fence and multiple shared fences, towards a notation where we have explicit categories for writers, readers and others.
> >> Fwiw, I would like the distinction here between optional fences
> >> (writers, readers) and mandatory fences (others). The optional fences
> >> are where we put the implicit fence tracking that clever userspace would
> >> rather avoid. The mandatory fences (I would call internal) is where we
> >> put the fences for tracking migration that userspace can not opt out of.
> > I think this would make sense, and is kinda what I expected here.
>
> Yeah, exactly that's the intention here.
>
> Basic idea is to group the fences into the categories of "you always
> need to wait for when doing implicit synchronization" (writers), "you
> only need to wait for them when you want to write to the object"
> (readers) and "ignore them for implicit synchronization".
>
> > If (and I think that's a huge if) we can agree on what those internal
> > fences are. There's a huge difference between internal fences for
> > buffer moves (better not ignore those) and internal fences like
> > amdkfd's eviction fence (better ignore those).
>
> Yeah, that's exactly why I want to get away from those exclusive/shared
> naming.

The bikeshed was epic. The idea behind exclusive/shared was that you
might want to ignore writers (like amdgpu does for internal buffers),
so shared doesn't necessarily mean it only contains readers, there
might also be writers in there. But only writers who are coordinating
their writes through some other means.

For writers the reason with going with exclusive was again the above,
that you might not want to put all writers into the exclusive slot
(amdgpu doesn't, at least for internal stuff). Also, some exclusive
fences might not be traditional writers, but other stuff like bo
moves.

But clearly amdkfd_eviction fence doesn't fit into this scheme. And on
the other hand we might want to have better rules to differentiate
between writers/reads for implicit sync and stuff the kernel does
more. Currently the rules are that you always have to sync with the
exclusive fence, since you have no idea why exactly it is exclusive -
it could be implicit sync, or it could be a bo move, or something else
entirely. At least for foreing fences.

For your own fences I don't think any of this matters, and essentially
you can treat them all as just an overall list of fences on your bo.
E.g. you could also treat the exlusive fence slot as a shared fence
slot for internal purposes, if the driver-internal semantics allow
that.

> For readers/writers I hoped the semantic would be more clear, but that's
> doesn't seems to be the case.
>
> > So whatever we do add, it better come with really clear docs and pretty diagrams about what
> > it's supposed to do, and how it's supposed to be used. Or we're just
> > back to the current mess we're in, times two.
>
> Well documenting it in the end is clearly a good idea, but I don't think
> we should start with that before we actually know what we want to
> implement and how we want to implement it.
>
> Otherwise I would write tons of documentation which can be thrown away
> again in the end because we decided to don't do it this way.

Yeah there's a bit a problem there. So for your RFC I guess the main
goal is the "other" bucket, which is both going to be a huge bikeshed
(not again ...) but also real tricky to figure out the semantics. If
the goal with "other" is to use that for bo moves, and not for
amdkfd_eviction (that one I feel like doesn't fit, at least amdgpu
code tends to ignore it in many cases), then maybe we should call it
kernel_exclusive? Or something else that indicates it's a level above
the normal exclusive, kinda usual user/kernel split. And if you ignore
the kernel_exclusive fence then the kernel will break (and not just
some data corruption visible to userspace).

Or is there something else you want to clear up? I mean aside from the
reader/writer vs exlusive/shared thing, imo that's easier to decided
once we know what "other" is exactly like.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/10] dma-buf/resv: add other operations
  2019-08-21 12:31   ` Christian König
@ 2019-08-22 12:28     ` Ville Syrjälä
  -1 siblings, 0 replies; 66+ messages in thread
From: Ville Syrjälä @ 2019-08-22 12:28 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

On Wed, Aug 21, 2019 at 02:31:47PM +0200, Christian König wrote:
> Additional to readers and writers add another class of operations
> which never participate in implicit synchronization.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c | 27 ++++++++++++++++++++++++---
>  include/linux/dma-resv.h   |  2 ++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 8ef7dbc7fd8e..c6dd6c36dba2 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -280,6 +280,7 @@ void dma_resv_init(struct dma_resv *obj)
>  
>  	dma_resv_fences_init(&obj->writers);
>  	dma_resv_fences_init(&obj->readers);
> +	dma_resv_fences_init(&obj->others);
>  }
>  EXPORT_SYMBOL(dma_resv_init);
>  
> @@ -295,6 +296,7 @@ void dma_resv_fini(struct dma_resv *obj)
>  	 */
>  	dma_resv_fences_fini(&obj->writers);
>  	dma_resv_fences_fini(&obj->readers);
> +	dma_resv_fences_fini(&obj->others);
>  	ww_mutex_destroy(&obj->lock);
>  }
>  EXPORT_SYMBOL(dma_resv_fini);
> @@ -334,6 +336,10 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>  	fence = dma_resv_fences_deref(obj, &obj->readers);
>  	if (dma_fence_is_signaled(fence))
>  		dma_resv_fences_set(obj, &obj->readers, NULL);
> +
> +	fence = dma_resv_fences_deref(obj, &obj->others);
> +	if (dma_fence_is_signaled(fence))
> +		dma_resv_fences_set(obj, &obj->others, NULL);
>  }
>  EXPORT_SYMBOL(dma_resv_prune_fences);
>  
> @@ -346,17 +352,19 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
>  */
>  int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
>  {
> -	struct dma_fence *writers, *readers;
> +	struct dma_fence *writers, *readers, *others;
>  
>  	dma_resv_assert_held(dst);
>  
>  	rcu_read_lock();
>  	writers = dma_resv_fences_get_rcu(&src->writers);
>  	readers = dma_resv_fences_get_rcu(&src->readers);
> +	others = dma_resv_fences_get_rcu(&src->others);
>  	rcu_read_unlock();
>  
>  	dma_resv_fences_set(dst, &dst->writers, writers);
>  	dma_resv_fences_set(dst, &dst->readers, readers);
> +	dma_resv_fences_set(dst, &dst->readers, others);
                                       ^^^^^^^

others?


-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/10] dma-buf/resv: add other operations
@ 2019-08-22 12:28     ` Ville Syrjälä
  0 siblings, 0 replies; 66+ messages in thread
From: Ville Syrjälä @ 2019-08-22 12:28 UTC (permalink / raw)
  To: Christian König; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

On Wed, Aug 21, 2019 at 02:31:47PM +0200, Christian König wrote:
> Additional to readers and writers add another class of operations
> which never participate in implicit synchronization.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-resv.c | 27 ++++++++++++++++++++++++---
>  include/linux/dma-resv.h   |  2 ++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 8ef7dbc7fd8e..c6dd6c36dba2 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -280,6 +280,7 @@ void dma_resv_init(struct dma_resv *obj)
>  
>  	dma_resv_fences_init(&obj->writers);
>  	dma_resv_fences_init(&obj->readers);
> +	dma_resv_fences_init(&obj->others);
>  }
>  EXPORT_SYMBOL(dma_resv_init);
>  
> @@ -295,6 +296,7 @@ void dma_resv_fini(struct dma_resv *obj)
>  	 */
>  	dma_resv_fences_fini(&obj->writers);
>  	dma_resv_fences_fini(&obj->readers);
> +	dma_resv_fences_fini(&obj->others);
>  	ww_mutex_destroy(&obj->lock);
>  }
>  EXPORT_SYMBOL(dma_resv_fini);
> @@ -334,6 +336,10 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>  	fence = dma_resv_fences_deref(obj, &obj->readers);
>  	if (dma_fence_is_signaled(fence))
>  		dma_resv_fences_set(obj, &obj->readers, NULL);
> +
> +	fence = dma_resv_fences_deref(obj, &obj->others);
> +	if (dma_fence_is_signaled(fence))
> +		dma_resv_fences_set(obj, &obj->others, NULL);
>  }
>  EXPORT_SYMBOL(dma_resv_prune_fences);
>  
> @@ -346,17 +352,19 @@ EXPORT_SYMBOL(dma_resv_prune_fences);
>  */
>  int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
>  {
> -	struct dma_fence *writers, *readers;
> +	struct dma_fence *writers, *readers, *others;
>  
>  	dma_resv_assert_held(dst);
>  
>  	rcu_read_lock();
>  	writers = dma_resv_fences_get_rcu(&src->writers);
>  	readers = dma_resv_fences_get_rcu(&src->readers);
> +	others = dma_resv_fences_get_rcu(&src->others);
>  	rcu_read_unlock();
>  
>  	dma_resv_fences_set(dst, &dst->writers, writers);
>  	dma_resv_fences_set(dst, &dst->readers, readers);
> +	dma_resv_fences_set(dst, &dst->readers, others);
                                       ^^^^^^^

others?


-- 
Ville Syrjälä
Intel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
  2019-08-22  8:23       ` Christian König
@ 2019-08-22 13:02         ` Daniel Vetter
  -1 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-22 13:02 UTC (permalink / raw)
  To: christian.koenig
  Cc: Daniel Vetter, dri-devel, chris, daniel.vetter, sumit.semwal,
	linux-media, linaro-mm-sig

On Thu, Aug 22, 2019 at 10:23:29AM +0200, Christian König wrote:
> Am 21.08.19 um 18:04 schrieb Daniel Vetter:
> > On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
> > > [SNIP]
> > > +	/* Try to drop the last reference */
> > > +	if (!dma_fence_array_recycle(staged))
> > Without an rcu barrier here you're not syncing to new clients at all.
> > I don't think this works, and I expect that once you've readded all the
> > barriers and retry loops we're back to seqlocks.
> 
> The key difference is that RCU users now use dma_fence_get_rcu_safe() to
> grab a reference to the current set of fences.
> 
> In other words the whole array is reference counted and RCU protected
> instead of each individual entry in the array.
> 
> This way you don't need the sequence count any more because you grab a
> reference to all of them at once and then can be sure that they don't
> change.

Hm yeah ... I think there's still some users left that have an open-coded
rcu section though. But yeah if you can concince Chris that this is ok I
think it makes sense as an overall cleanup of the hand-rolled fences array
we have for shared fences. But I'd really like to untangle it from the
entire semantics discussion, since that seems entirely unrelated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
@ 2019-08-22 13:02         ` Daniel Vetter
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Vetter @ 2019-08-22 13:02 UTC (permalink / raw)
  To: christian.koenig; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

On Thu, Aug 22, 2019 at 10:23:29AM +0200, Christian König wrote:
> Am 21.08.19 um 18:04 schrieb Daniel Vetter:
> > On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
> > > [SNIP]
> > > +	/* Try to drop the last reference */
> > > +	if (!dma_fence_array_recycle(staged))
> > Without an rcu barrier here you're not syncing to new clients at all.
> > I don't think this works, and I expect that once you've readded all the
> > barriers and retry loops we're back to seqlocks.
> 
> The key difference is that RCU users now use dma_fence_get_rcu_safe() to
> grab a reference to the current set of fences.
> 
> In other words the whole array is reference counted and RCU protected
> instead of each individual entry in the array.
> 
> This way you don't need the sequence count any more because you grab a
> reference to all of them at once and then can be sure that they don't
> change.

Hm yeah ... I think there's still some users left that have an open-coded
rcu section though. But yeah if you can concince Chris that this is ok I
think it makes sense as an overall cleanup of the hand-rolled fences array
we have for shared fences. But I'd really like to untangle it from the
entire semantics discussion, since that seems entirely unrelated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
  2019-08-22 13:02         ` Daniel Vetter
@ 2019-08-22 13:53           ` Koenig, Christian
  -1 siblings, 0 replies; 66+ messages in thread
From: Koenig, Christian @ 2019-08-22 13:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, chris, daniel.vetter, sumit.semwal, linux-media,
	linaro-mm-sig

Am 22.08.19 um 15:02 schrieb Daniel Vetter:
> On Thu, Aug 22, 2019 at 10:23:29AM +0200, Christian König wrote:
>> Am 21.08.19 um 18:04 schrieb Daniel Vetter:
>>> On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> +	/* Try to drop the last reference */
>>>> +	if (!dma_fence_array_recycle(staged))
>>> Without an rcu barrier here you're not syncing to new clients at all.
>>> I don't think this works, and I expect that once you've readded all the
>>> barriers and retry loops we're back to seqlocks.
>> The key difference is that RCU users now use dma_fence_get_rcu_safe() to
>> grab a reference to the current set of fences.
>>
>> In other words the whole array is reference counted and RCU protected
>> instead of each individual entry in the array.
>>
>> This way you don't need the sequence count any more because you grab a
>> reference to all of them at once and then can be sure that they don't
>> change.
> Hm yeah ... I think there's still some users left that have an open-coded
> rcu section though. But yeah if you can concince Chris that this is ok I
> think it makes sense as an overall cleanup of the hand-rolled fences array
> we have for shared fences. But I'd really like to untangle it from the
> entire semantics discussion, since that seems entirely unrelated.

Yeah, agree. To untangle that is a really good idea.

Going to send out the dma_fence_array as a replacement for shared fences 
separately first.

Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/10] dma-buf/resv: add new fences container implementation
@ 2019-08-22 13:53           ` Koenig, Christian
  0 siblings, 0 replies; 66+ messages in thread
From: Koenig, Christian @ 2019-08-22 13:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: daniel.vetter, dri-devel, linaro-mm-sig, linux-media

Am 22.08.19 um 15:02 schrieb Daniel Vetter:
> On Thu, Aug 22, 2019 at 10:23:29AM +0200, Christian König wrote:
>> Am 21.08.19 um 18:04 schrieb Daniel Vetter:
>>> On Wed, Aug 21, 2019 at 02:31:44PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> +	/* Try to drop the last reference */
>>>> +	if (!dma_fence_array_recycle(staged))
>>> Without an rcu barrier here you're not syncing to new clients at all.
>>> I don't think this works, and I expect that once you've readded all the
>>> barriers and retry loops we're back to seqlocks.
>> The key difference is that RCU users now use dma_fence_get_rcu_safe() to
>> grab a reference to the current set of fences.
>>
>> In other words the whole array is reference counted and RCU protected
>> instead of each individual entry in the array.
>>
>> This way you don't need the sequence count any more because you grab a
>> reference to all of them at once and then can be sure that they don't
>> change.
> Hm yeah ... I think there's still some users left that have an open-coded
> rcu section though. But yeah if you can concince Chris that this is ok I
> think it makes sense as an overall cleanup of the hand-rolled fences array
> we have for shared fences. But I'd really like to untangle it from the
> entire semantics discussion, since that seems entirely unrelated.

Yeah, agree. To untangle that is a really good idea.

Going to send out the dma_fence_array as a replacement for shared fences 
separately first.

Christian.

> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
  2019-08-21 12:31   ` Christian König
@ 2019-08-24 13:22     ` Chris Wilson
  -1 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-24 13:22 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>                 dma_fence_put(fence);
>         }
>  
> -       list = dma_resv_get_list(obj);
> -       if (!list)
> -               return;
> -
> -       for (i = 0; i < list->shared_count; ++i) {
> -               fence = rcu_dereference_protected(list->shared[i],
> -                                                 dma_resv_held(obj));
> -
> -               if (!dma_fence_is_signaled(fence))
> -                       continue;
> -
> -               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> -               dma_fence_put(fence);
> -       }
> +       fence = dma_resv_fences_deref(obj, &obj->readers);
> +       if (dma_fence_is_signaled(fence))
> +               dma_resv_fences_set(obj, &obj->readers, NULL);

Something to note is that a dma-fence-array is not automatically
signaled and dma_fence_is_signaled() does not check the array.
-Chris

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container
@ 2019-08-24 13:22     ` Chris Wilson
  0 siblings, 0 replies; 66+ messages in thread
From: Chris Wilson @ 2019-08-24 13:22 UTC (permalink / raw)
  To: Christian König, daniel.vetter, dri-devel, linaro-mm-sig,
	linux-media, sumit.semwal

Quoting Christian König (2019-08-21 13:31:45)
> @@ -528,20 +352,9 @@ void dma_resv_prune_fences(struct dma_resv *obj)
>                 dma_fence_put(fence);
>         }
>  
> -       list = dma_resv_get_list(obj);
> -       if (!list)
> -               return;
> -
> -       for (i = 0; i < list->shared_count; ++i) {
> -               fence = rcu_dereference_protected(list->shared[i],
> -                                                 dma_resv_held(obj));
> -
> -               if (!dma_fence_is_signaled(fence))
> -                       continue;
> -
> -               RCU_INIT_POINTER(list->shared[i], dma_fence_get_stub());
> -               dma_fence_put(fence);
> -       }
> +       fence = dma_resv_fences_deref(obj, &obj->readers);
> +       if (dma_fence_is_signaled(fence))
> +               dma_resv_fences_set(obj, &obj->readers, NULL);

Something to note is that a dma-fence-array is not automatically
signaled and dma_fence_is_signaled() does not check the array.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2019-08-24 13:22 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 12:31 [RFC] replacing dma_resv API Christian König
2019-08-21 12:31 ` Christian König
2019-08-21 12:31 ` [PATCH 01/10] dma-buf: make to_dma_fence_array NULL safe Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 12:31 ` [PATCH 02/10] dma-buf: add dma_fence_array_alloc/free Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 12:31 ` [PATCH 03/10] dma-buf: add dma_fence_array_recycle Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 16:24   ` Chris Wilson
2019-08-21 16:24     ` Chris Wilson
2019-08-22  8:38     ` Christian König
2019-08-22  8:38       ` Christian König
2019-08-21 12:31 ` [PATCH 04/10] dma-buf: add dma_fence_array_for_each Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 12:31 ` [PATCH 05/10] dma-buf/resv: add dma_resv_prune_fences Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 14:55   ` Chris Wilson
2019-08-21 14:55     ` Chris Wilson
2019-08-21 14:56     ` Chris Wilson
2019-08-21 14:56       ` Chris Wilson
2019-08-21 12:31 ` [PATCH 06/10] dma-buf/resv: stop pruning shared fences when exclusive is added Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 12:31 ` [PATCH 07/10] dma-buf/resv: add new fences container implementation Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 16:04   ` Daniel Vetter
2019-08-21 16:04     ` Daniel Vetter
2019-08-22  8:23     ` Christian König
2019-08-22  8:23       ` Christian König
2019-08-22 13:02       ` Daniel Vetter
2019-08-22 13:02         ` Daniel Vetter
2019-08-22 13:53         ` Koenig, Christian
2019-08-22 13:53           ` Koenig, Christian
2019-08-21 12:31 ` [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 15:24   ` Chris Wilson
2019-08-21 15:24     ` Chris Wilson
2019-08-21 17:35     ` Chris Wilson
2019-08-21 17:35       ` Chris Wilson
2019-08-22  8:37       ` Christian König
2019-08-22  8:37         ` Christian König
2019-08-22  9:16         ` Christian König
2019-08-22  9:16           ` Christian König
2019-08-21 16:21   ` Chris Wilson
2019-08-21 16:21     ` Chris Wilson
2019-08-24 13:22   ` Chris Wilson
2019-08-24 13:22     ` Chris Wilson
2019-08-21 12:31 ` [PATCH 09/10] dma-buf/resv: replace exclusive " Christian König
2019-08-21 12:31   ` Christian König
2019-08-21 12:31 ` [PATCH 10/10] dma-buf/resv: add other operations Christian König
2019-08-21 12:31   ` Christian König
2019-08-22 12:28   ` Ville Syrjälä
2019-08-22 12:28     ` Ville Syrjälä
2019-08-21 16:13 ` [RFC] replacing dma_resv API Daniel Vetter
2019-08-21 16:13   ` Daniel Vetter
2019-08-21 20:05   ` Daniel Vetter
2019-08-21 20:05     ` Daniel Vetter
2019-08-22  9:27     ` Christian König
2019-08-22  9:27       ` Christian König
2019-08-21 20:11 ` Chris Wilson
2019-08-21 20:11   ` Chris Wilson
2019-08-21 20:22   ` Daniel Vetter
2019-08-21 20:22     ` Daniel Vetter
2019-08-22  9:14     ` Christian König
2019-08-22  9:14       ` Christian König
2019-08-22 10:00       ` Daniel Vetter
2019-08-22 10:00         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.