All of lore.kernel.org
 help / color / mirror / Atom feed
* Change how amdgpu stores fences in dma_resv objects
@ 2021-06-10  9:17 ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.

This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.

For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.

During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.

This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.

Regards,
Christian.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Change how amdgpu stores fences in dma_resv objects
@ 2021-06-10  9:17 ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.

This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.

For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.

During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.

This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.

Regards,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/7] dma-buf: some dma_fence_chain improvements
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

The callback and the irq work are never used at the same
time. Putting them into an union saves us 24 bytes and
makes the structure only 120 bytes in size.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c | 2 +-
 include/linux/dma-fence-chain.h   | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 7d129e68ac70..1b4cb3e5cec9 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -137,6 +137,7 @@ static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
 	struct dma_fence_chain *chain;
 
 	chain = container_of(cb, typeof(*chain), cb);
+	init_irq_work(&chain->work, dma_fence_chain_irq_work);
 	irq_work_queue(&chain->work);
 	dma_fence_put(f);
 }
@@ -239,7 +240,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 	rcu_assign_pointer(chain->prev, prev);
 	chain->fence = fence;
 	chain->prev_seqno = 0;
-	init_irq_work(&chain->work, dma_fence_chain_irq_work);
 
 	/* Try to reuse the context of the previous chain node. */
 	if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 10462a029da2..9d6a062be640 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -25,12 +25,14 @@
  */
 struct dma_fence_chain {
 	struct dma_fence base;
-	spinlock_t lock;
 	struct dma_fence __rcu *prev;
 	u64 prev_seqno;
 	struct dma_fence *fence;
-	struct dma_fence_cb cb;
-	struct irq_work work;
+	union {
+		struct dma_fence_cb cb;
+		struct irq_work work;
+	};
+	spinlock_t lock;
 };
 
 extern const struct dma_fence_ops dma_fence_chain_ops;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/7] dma-buf: some dma_fence_chain improvements
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

The callback and the irq work are never used at the same
time. Putting them into an union saves us 24 bytes and
makes the structure only 120 bytes in size.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c | 2 +-
 include/linux/dma-fence-chain.h   | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 7d129e68ac70..1b4cb3e5cec9 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -137,6 +137,7 @@ static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
 	struct dma_fence_chain *chain;
 
 	chain = container_of(cb, typeof(*chain), cb);
+	init_irq_work(&chain->work, dma_fence_chain_irq_work);
 	irq_work_queue(&chain->work);
 	dma_fence_put(f);
 }
@@ -239,7 +240,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 	rcu_assign_pointer(chain->prev, prev);
 	chain->fence = fence;
 	chain->prev_seqno = 0;
-	init_irq_work(&chain->work, dma_fence_chain_irq_work);
 
 	/* Try to reuse the context of the previous chain node. */
 	if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 10462a029da2..9d6a062be640 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -25,12 +25,14 @@
  */
 struct dma_fence_chain {
 	struct dma_fence base;
-	spinlock_t lock;
 	struct dma_fence __rcu *prev;
 	u64 prev_seqno;
 	struct dma_fence *fence;
-	struct dma_fence_cb cb;
-	struct irq_work work;
+	union {
+		struct dma_fence_cb cb;
+		struct irq_work work;
+	};
+	spinlock_t lock;
 };
 
 extern const struct dma_fence_ops dma_fence_chain_ops;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
and some unused code in the selftest.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
 drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
 drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
 include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
 6 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 9525f7f56119..8ce1ea59d31b 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
 	return &f->base;
 }
 
-static inline struct mock_chain {
-	struct dma_fence_chain base;
-} *to_mock_chain(struct dma_fence *f) {
-	return container_of(f, struct mock_chain, base.base);
-}
-
 static struct dma_fence *mock_chain(struct dma_fence *prev,
 				    struct dma_fence *fence,
 				    u64 seqno)
 {
-	struct mock_chain *f;
+	struct dma_fence_chain *f;
 
-	f = kmalloc(sizeof(*f), GFP_KERNEL);
+	f = dma_fence_chain_alloc();
 	if (!f)
 		return NULL;
 
-	dma_fence_chain_init(&f->base,
-			     dma_fence_get(prev),
-			     dma_fence_get(fence),
+	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
 			     seqno);
 
-	return &f->base.base;
+	return &f->base;
 }
 
 static int sanitycheck(void *arg)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 90136f9dedd6..325e82621467 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
 
 		dep->chain = NULL;
 		if (syncobj_deps[i].point) {
-			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
+			dep->chain = dma_fence_chain_alloc();
 			if (!dep->chain)
 				return -ENOMEM;
 		}
@@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
 		dep->syncobj = drm_syncobj_find(p->filp,
 						syncobj_deps[i].handle);
 		if (!dep->syncobj) {
-			kfree(dep->chain);
+			dma_fence_chain_free(dep->chain);
 			return -EINVAL;
 		}
 		dep->point = syncobj_deps[i].point;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..1c5b9ef6da37 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
 				     &fence);
 	if (ret)
 		goto err;
-	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+	chain = dma_fence_chain_alloc();
 	if (!chain) {
 		ret = -ENOMEM;
 		goto err1;
@@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
 		goto err_points;
 	}
 	for (i = 0; i < args->count_handles; i++) {
-		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+		chains[i] = dma_fence_chain_alloc();
 		if (!chains[i]) {
 			for (j = 0; j < i; j++)
-				kfree(chains[j]);
+				dma_fence_chain_free(chains[j]);
 			ret = -ENOMEM;
 			goto err_chains;
 		}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 66789111a24b..a22cb86730b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
 	while (n--) {
 		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
 		dma_fence_put(fences[n].dma_fence);
-		kfree(fences[n].chain_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
 	}
 	kvfree(fences);
 }
@@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 				return -EINVAL;
 			}
 
-			f->chain_fence =
-				kmalloc(sizeof(*f->chain_fence),
-					GFP_KERNEL);
+			f->chain_fence = dma_fence_chain_alloc();
 			if (!f->chain_fence) {
 				drm_syncobj_put(syncobj);
 				dma_fence_put(fence);
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5480852bdeda..6ff6df6c4791 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
 				break;
 			}
 
-			post_deps[i].chain =
-				kmalloc(sizeof(*post_deps[i].chain),
-				        GFP_KERNEL);
+			post_deps[i].chain = dma_fence_chain_alloc();
 			if (!post_deps[i].chain) {
 				ret = -ENOMEM;
 				break;
@@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
 
 	if (ret) {
 		for (j = 0; j <= i; ++j) {
-			kfree(post_deps[j].chain);
+			dma_fence_chain_free(post_deps[j].chain);
 			if (post_deps[j].syncobj)
 				drm_syncobj_put(post_deps[j].syncobj);
 		}
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 9d6a062be640..5f45689a6d2e 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -12,6 +12,7 @@
 
 #include <linux/dma-fence.h>
 #include <linux/irq_work.h>
+#include <linux/slab.h>
 
 /**
  * struct dma_fence_chain - fence to represent an node of a fence chain
@@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
 	return container_of(fence, struct dma_fence_chain, base);
 }
 
+/**
+ * dma_fence_chain_alloc
+ *
+ * Returns a new dma_fence_chain object
+ */
+static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
+{
+	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+};
+
+/**
+ * dma_fence_chain_free
+ * @chain: chain node to free
+ *
+ * Frees up an allocated but not used dma_fence_chain node.
+ */
+static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
+{
+	kfree(chain);
+};
+
 /**
  * dma_fence_chain_for_each - iterate over all fences in chain
  * @iter: current fence
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
and some unused code in the selftest.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
 drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
 drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
 include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
 6 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 9525f7f56119..8ce1ea59d31b 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
 	return &f->base;
 }
 
-static inline struct mock_chain {
-	struct dma_fence_chain base;
-} *to_mock_chain(struct dma_fence *f) {
-	return container_of(f, struct mock_chain, base.base);
-}
-
 static struct dma_fence *mock_chain(struct dma_fence *prev,
 				    struct dma_fence *fence,
 				    u64 seqno)
 {
-	struct mock_chain *f;
+	struct dma_fence_chain *f;
 
-	f = kmalloc(sizeof(*f), GFP_KERNEL);
+	f = dma_fence_chain_alloc();
 	if (!f)
 		return NULL;
 
-	dma_fence_chain_init(&f->base,
-			     dma_fence_get(prev),
-			     dma_fence_get(fence),
+	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
 			     seqno);
 
-	return &f->base.base;
+	return &f->base;
 }
 
 static int sanitycheck(void *arg)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 90136f9dedd6..325e82621467 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
 
 		dep->chain = NULL;
 		if (syncobj_deps[i].point) {
-			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
+			dep->chain = dma_fence_chain_alloc();
 			if (!dep->chain)
 				return -ENOMEM;
 		}
@@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
 		dep->syncobj = drm_syncobj_find(p->filp,
 						syncobj_deps[i].handle);
 		if (!dep->syncobj) {
-			kfree(dep->chain);
+			dma_fence_chain_free(dep->chain);
 			return -EINVAL;
 		}
 		dep->point = syncobj_deps[i].point;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..1c5b9ef6da37 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
 				     &fence);
 	if (ret)
 		goto err;
-	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+	chain = dma_fence_chain_alloc();
 	if (!chain) {
 		ret = -ENOMEM;
 		goto err1;
@@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
 		goto err_points;
 	}
 	for (i = 0; i < args->count_handles; i++) {
-		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+		chains[i] = dma_fence_chain_alloc();
 		if (!chains[i]) {
 			for (j = 0; j < i; j++)
-				kfree(chains[j]);
+				dma_fence_chain_free(chains[j]);
 			ret = -ENOMEM;
 			goto err_chains;
 		}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 66789111a24b..a22cb86730b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
 	while (n--) {
 		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
 		dma_fence_put(fences[n].dma_fence);
-		kfree(fences[n].chain_fence);
+		dma_fence_chain_free(fences[n].chain_fence);
 	}
 	kvfree(fences);
 }
@@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
 				return -EINVAL;
 			}
 
-			f->chain_fence =
-				kmalloc(sizeof(*f->chain_fence),
-					GFP_KERNEL);
+			f->chain_fence = dma_fence_chain_alloc();
 			if (!f->chain_fence) {
 				drm_syncobj_put(syncobj);
 				dma_fence_put(fence);
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5480852bdeda..6ff6df6c4791 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
 				break;
 			}
 
-			post_deps[i].chain =
-				kmalloc(sizeof(*post_deps[i].chain),
-				        GFP_KERNEL);
+			post_deps[i].chain = dma_fence_chain_alloc();
 			if (!post_deps[i].chain) {
 				ret = -ENOMEM;
 				break;
@@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
 
 	if (ret) {
 		for (j = 0; j <= i; ++j) {
-			kfree(post_deps[j].chain);
+			dma_fence_chain_free(post_deps[j].chain);
 			if (post_deps[j].syncobj)
 				drm_syncobj_put(post_deps[j].syncobj);
 		}
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 9d6a062be640..5f45689a6d2e 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -12,6 +12,7 @@
 
 #include <linux/dma-fence.h>
 #include <linux/irq_work.h>
+#include <linux/slab.h>
 
 /**
  * struct dma_fence_chain - fence to represent an node of a fence chain
@@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
 	return container_of(fence, struct dma_fence_chain, base);
 }
 
+/**
+ * dma_fence_chain_alloc
+ *
+ * Returns a new dma_fence_chain object
+ */
+static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
+{
+	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+};
+
+/**
+ * dma_fence_chain_free
+ * @chain: chain node to free
+ *
+ * Frees up an allocated but not used dma_fence_chain node.
+ */
+static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
+{
+	kfree(chain);
+};
+
 /**
  * dma_fence_chain_for_each - iterate over all fences in chain
  * @iter: current fence
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Exercise the newly added functions.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..855c129c6093 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
 	return err;
 }
 
+static int __alloc_free(void *arg)
+{
+	atomic_t *counter = arg;
+	int i, j;
+
+	for (i = 0; i < 1024; ++i) {
+		struct dma_fence_chain *chains[64];
+
+		for (j = 0; j < ARRAY_SIZE(chains); ++j)
+			chains[j] = dma_fence_chain_alloc();
+
+		for (j = 0; j < ARRAY_SIZE(chains); ++j)
+			dma_fence_chain_free(chains[j]);
+
+		atomic_add(ARRAY_SIZE(chains), counter);
+	}
+	return 0;
+}
+
+static int alloc_free(void *arg)
+{
+	struct task_struct *threads[8];
+	atomic_t counter = ATOMIC_INIT(0);
+	int i, err = 0;
+
+	for (i = 0; i < ARRAY_SIZE(threads); i++) {
+		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
+					 i);
+		if (IS_ERR(threads[i])) {
+			err = PTR_ERR(threads[i]);
+			break;
+		}
+	}
+
+	while (i--) {
+		int ret;
+
+		ret = kthread_stop(threads[i]);
+		if (ret && !err)
+			err = ret;
+	}
+
+	pr_info("Completed %u cycles\n", atomic_read(&counter));
+
+	return err;
+}
+
 struct fence_chains {
 	unsigned int chain_length;
 	struct dma_fence **fences;
@@ -677,6 +724,7 @@ int dma_fence_chain(void)
 {
 	static const struct subtest tests[] = {
 		SUBTEST(sanitycheck),
+		SUBTEST(alloc_free),
 		SUBTEST(find_seqno),
 		SUBTEST(find_signaled),
 		SUBTEST(find_out_of_order),
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Exercise the newly added functions.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..855c129c6093 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
 	return err;
 }
 
+static int __alloc_free(void *arg)
+{
+	atomic_t *counter = arg;
+	int i, j;
+
+	for (i = 0; i < 1024; ++i) {
+		struct dma_fence_chain *chains[64];
+
+		for (j = 0; j < ARRAY_SIZE(chains); ++j)
+			chains[j] = dma_fence_chain_alloc();
+
+		for (j = 0; j < ARRAY_SIZE(chains); ++j)
+			dma_fence_chain_free(chains[j]);
+
+		atomic_add(ARRAY_SIZE(chains), counter);
+	}
+	return 0;
+}
+
+static int alloc_free(void *arg)
+{
+	struct task_struct *threads[8];
+	atomic_t counter = ATOMIC_INIT(0);
+	int i, err = 0;
+
+	for (i = 0; i < ARRAY_SIZE(threads); i++) {
+		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
+					 i);
+		if (IS_ERR(threads[i])) {
+			err = PTR_ERR(threads[i]);
+			break;
+		}
+	}
+
+	while (i--) {
+		int ret;
+
+		ret = kthread_stop(threads[i]);
+		if (ret && !err)
+			err = ret;
+	}
+
+	pr_info("Completed %u cycles\n", atomic_read(&counter));
+
+	return err;
+}
+
 struct fence_chains {
 	unsigned int chain_length;
 	struct dma_fence **fences;
@@ -677,6 +724,7 @@ int dma_fence_chain(void)
 {
 	static const struct subtest tests[] = {
 		SUBTEST(sanitycheck),
+		SUBTEST(alloc_free),
 		SUBTEST(find_seqno),
 		SUBTEST(find_signaled),
 		SUBTEST(find_out_of_order),
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Add some rather sophisticated lockless garbage collection
for dma_fence_chain objects.

For this keep all initialized dma_fence_chain nodes an a
queue and trigger garbage collection before a new one is
allocated.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
 include/linux/dma-fence-chain.h   |   5 +
 2 files changed, 142 insertions(+), 23 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 1b4cb3e5cec9..c2f0b69eabb7 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,8 +9,53 @@
 
 #include <linux/dma-fence-chain.h>
 
+static struct dma_fence_chain __rcu *fifo_front;
+static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
+
 static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
 
+/**
+ * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
+ * @chain: the chain node to enqueue
+ *
+ * Add the chain node to the end of the gc fifo.
+ */
+static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
+{
+	struct dma_fence_chain __rcu **tmp;
+
+	RCU_INIT_POINTER(chain->next, NULL);
+	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
+		   &chain->next);
+
+	/* This is intentionally unordered since we only need the fifo for gc */
+	rcu_assign_pointer(*tmp, chain);
+}
+
+/**
+ * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
+ *
+ * Remove the first chain node from the gc fifo while making sure that always
+ * keep at least one node on the fifo for lockless fifo implementation.
+ * Returns the dequeued chain node or NULL.
+ */
+static struct dma_fence_chain *dma_fence_chain_dequeue(void)
+{
+	struct dma_fence_chain *chain, *tmp;
+
+	rcu_read_lock();
+	chain = rcu_dereference(fifo_front);
+	/* Never dequeue the last chain node for lockless fifo */
+	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
+		      chain, rcu_access_pointer(chain->next));
+	rcu_read_unlock();
+	return tmp == chain ? chain : NULL;
+}
+
 /**
  * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
  * @chain: chain node to get the previous node from
@@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
 	return prev;
 }
 
+/**
+ * dma_fence_chain_try_replace - try to replace the prev node
+ * @chain: Chain node we try to replace prev.
+ * @prev: the old prev node
+ *
+ * Try to replace the previous chain node when it or its containing fence is
+ * signaled. Returns true if we tried, false if we need to wait.
+ */
+static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
+					struct dma_fence *prev)
+{
+	struct dma_fence *replacement, *tmp;
+	struct dma_fence_chain *prev_chain;
+
+	prev_chain = to_dma_fence_chain(prev);
+	if (prev_chain) {
+		if (!dma_fence_is_signaled(prev_chain->fence))
+			return false;
+
+		replacement = dma_fence_chain_get_prev(prev_chain);
+	} else {
+		if (!dma_fence_is_signaled(prev))
+			return false;
+
+		replacement = NULL;
+	}
+
+	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
+		      replacement);
+	if (tmp == prev)
+		dma_fence_put(tmp);
+	else
+		dma_fence_put(replacement);
+
+	return true;
+}
+
 /**
  * dma_fence_chain_walk - chain walking function
  * @fence: current chain node
@@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
  */
 struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 {
-	struct dma_fence_chain *chain, *prev_chain;
-	struct dma_fence *prev, *replacement, *tmp;
+	struct dma_fence_chain *chain;
+	struct dma_fence *prev;
 
 	chain = to_dma_fence_chain(fence);
 	if (!chain) {
@@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 	}
 
 	while ((prev = dma_fence_chain_get_prev(chain))) {
-
-		prev_chain = to_dma_fence_chain(prev);
-		if (prev_chain) {
-			if (!dma_fence_is_signaled(prev_chain->fence))
-				break;
-
-			replacement = dma_fence_chain_get_prev(prev_chain);
-		} else {
-			if (!dma_fence_is_signaled(prev))
-				break;
-
-			replacement = NULL;
-		}
-
-		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
-			      prev, replacement);
-		if (tmp == prev)
-			dma_fence_put(tmp);
-		else
-			dma_fence_put(replacement);
+		if (!dma_fence_chain_try_replace(chain, prev))
+			break;
 		dma_fence_put(prev);
 	}
 
@@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
 	dma_fence_put(prev);
 
 	dma_fence_put(chain->fence);
-	dma_fence_free(fence);
+	WRITE_ONCE(chain->fence, NULL);
+
+	/*
+	 * Don't garbage collect here to avoid recursion and potential stack
+	 * overflow.
+	 */
+	chain = dma_fence_chain_dequeue();
+	if (!chain)
+		return;
+
+	if (kref_read(&chain->base.refcount) ||
+	    READ_ONCE(chain->fence))
+		dma_fence_chain_enqueue(chain);
+	else
+		dma_fence_free(&chain->base);
 }
 
 const struct dma_fence_ops dma_fence_chain_ops = {
@@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
 };
 EXPORT_SYMBOL(dma_fence_chain_ops);
 
+/**
+ * dma_fence_chain_garbage_collect - cleanup chain nodes
+ *
+ * Do some garbage collection and try to release chain nodes.
+ */
+void dma_fence_chain_garbage_collect(void)
+{
+	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
+
+	if (!chain)
+		return;
+
+	if (!dma_fence_get_rcu(&chain->base)) {
+		/* Unused, check if it's also clean */
+		if (likely(!READ_ONCE(chain->fence))) {
+			dma_fence_free(&chain->base);
+			return;
+		}
+
+	} else {
+		struct dma_fence *prev;
+
+		/* Used, do some chain walk */
+		prev = dma_fence_chain_get_prev(chain);
+		if (prev) {
+			dma_fence_chain_try_replace(chain, prev);
+			dma_fence_put(prev);
+		}
+		dma_fence_put(&chain->base);
+	}
+	dma_fence_chain_enqueue(chain);
+}
+EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
+
 /**
  * dma_fence_chain_init - initialize a fence chain
  * @chain: the chain node to initialize
@@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 
 	dma_fence_init(&chain->base, &dma_fence_chain_ops,
 		       &chain->lock, context, seqno);
+	dma_fence_chain_enqueue(chain);
 }
+
 EXPORT_SYMBOL(dma_fence_chain_init);
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 5f45689a6d2e..b412b5396434 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -19,6 +19,7 @@
  * @base: fence base class
  * @lock: spinlock for fence handling
  * @prev: previous fence of the chain
+ * @next: next chain node for garbage collection
  * @prev_seqno: original previous seqno before garbage collection
  * @fence: encapsulated fence
  * @cb: callback structure for signaling
@@ -27,6 +28,7 @@
 struct dma_fence_chain {
 	struct dma_fence base;
 	struct dma_fence __rcu *prev;
+	struct dma_fence_chain __rcu *next;
 	u64 prev_seqno;
 	struct dma_fence *fence;
 	union {
@@ -38,6 +40,8 @@ struct dma_fence_chain {
 
 extern const struct dma_fence_ops dma_fence_chain_ops;
 
+void dma_fence_chain_garbage_collect(void);
+
 /**
  * to_dma_fence_chain - cast a fence to a dma_fence_chain
  * @fence: fence to cast to a dma_fence_array
@@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
  */
 static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
 {
+	dma_fence_chain_garbage_collect();
 	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Add some rather sophisticated lockless garbage collection
for dma_fence_chain objects.

For this keep all initialized dma_fence_chain nodes an a
queue and trigger garbage collection before a new one is
allocated.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
 include/linux/dma-fence-chain.h   |   5 +
 2 files changed, 142 insertions(+), 23 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index 1b4cb3e5cec9..c2f0b69eabb7 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,8 +9,53 @@
 
 #include <linux/dma-fence-chain.h>
 
+static struct dma_fence_chain __rcu *fifo_front;
+static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
+
 static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
 
+/**
+ * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
+ * @chain: the chain node to enqueue
+ *
+ * Add the chain node to the end of the gc fifo.
+ */
+static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
+{
+	struct dma_fence_chain __rcu **tmp;
+
+	RCU_INIT_POINTER(chain->next, NULL);
+	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
+		   &chain->next);
+
+	/* This is intentionally unordered since we only need the fifo for gc */
+	rcu_assign_pointer(*tmp, chain);
+}
+
+/**
+ * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
+ *
+ * Remove the first chain node from the gc fifo while making sure that always
+ * keep at least one node on the fifo for lockless fifo implementation.
+ * Returns the dequeued chain node or NULL.
+ */
+static struct dma_fence_chain *dma_fence_chain_dequeue(void)
+{
+	struct dma_fence_chain *chain, *tmp;
+
+	rcu_read_lock();
+	chain = rcu_dereference(fifo_front);
+	/* Never dequeue the last chain node for lockless fifo */
+	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
+		      chain, rcu_access_pointer(chain->next));
+	rcu_read_unlock();
+	return tmp == chain ? chain : NULL;
+}
+
 /**
  * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
  * @chain: chain node to get the previous node from
@@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
 	return prev;
 }
 
+/**
+ * dma_fence_chain_try_replace - try to replace the prev node
+ * @chain: Chain node we try to replace prev.
+ * @prev: the old prev node
+ *
+ * Try to replace the previous chain node when it or its containing fence is
+ * signaled. Returns true if we tried, false if we need to wait.
+ */
+static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
+					struct dma_fence *prev)
+{
+	struct dma_fence *replacement, *tmp;
+	struct dma_fence_chain *prev_chain;
+
+	prev_chain = to_dma_fence_chain(prev);
+	if (prev_chain) {
+		if (!dma_fence_is_signaled(prev_chain->fence))
+			return false;
+
+		replacement = dma_fence_chain_get_prev(prev_chain);
+	} else {
+		if (!dma_fence_is_signaled(prev))
+			return false;
+
+		replacement = NULL;
+	}
+
+	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
+		      replacement);
+	if (tmp == prev)
+		dma_fence_put(tmp);
+	else
+		dma_fence_put(replacement);
+
+	return true;
+}
+
 /**
  * dma_fence_chain_walk - chain walking function
  * @fence: current chain node
@@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
  */
 struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 {
-	struct dma_fence_chain *chain, *prev_chain;
-	struct dma_fence *prev, *replacement, *tmp;
+	struct dma_fence_chain *chain;
+	struct dma_fence *prev;
 
 	chain = to_dma_fence_chain(fence);
 	if (!chain) {
@@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 	}
 
 	while ((prev = dma_fence_chain_get_prev(chain))) {
-
-		prev_chain = to_dma_fence_chain(prev);
-		if (prev_chain) {
-			if (!dma_fence_is_signaled(prev_chain->fence))
-				break;
-
-			replacement = dma_fence_chain_get_prev(prev_chain);
-		} else {
-			if (!dma_fence_is_signaled(prev))
-				break;
-
-			replacement = NULL;
-		}
-
-		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
-			      prev, replacement);
-		if (tmp == prev)
-			dma_fence_put(tmp);
-		else
-			dma_fence_put(replacement);
+		if (!dma_fence_chain_try_replace(chain, prev))
+			break;
 		dma_fence_put(prev);
 	}
 
@@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
 	dma_fence_put(prev);
 
 	dma_fence_put(chain->fence);
-	dma_fence_free(fence);
+	WRITE_ONCE(chain->fence, NULL);
+
+	/*
+	 * Don't garbage collect here to avoid recursion and potential stack
+	 * overflow.
+	 */
+	chain = dma_fence_chain_dequeue();
+	if (!chain)
+		return;
+
+	if (kref_read(&chain->base.refcount) ||
+	    READ_ONCE(chain->fence))
+		dma_fence_chain_enqueue(chain);
+	else
+		dma_fence_free(&chain->base);
 }
 
 const struct dma_fence_ops dma_fence_chain_ops = {
@@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
 };
 EXPORT_SYMBOL(dma_fence_chain_ops);
 
+/**
+ * dma_fence_chain_garbage_collect - cleanup chain nodes
+ *
+ * Do some garbage collection and try to release chain nodes.
+ */
+void dma_fence_chain_garbage_collect(void)
+{
+	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
+
+	if (!chain)
+		return;
+
+	if (!dma_fence_get_rcu(&chain->base)) {
+		/* Unused, check if it's also clean */
+		if (likely(!READ_ONCE(chain->fence))) {
+			dma_fence_free(&chain->base);
+			return;
+		}
+
+	} else {
+		struct dma_fence *prev;
+
+		/* Used, do some chain walk */
+		prev = dma_fence_chain_get_prev(chain);
+		if (prev) {
+			dma_fence_chain_try_replace(chain, prev);
+			dma_fence_put(prev);
+		}
+		dma_fence_put(&chain->base);
+	}
+	dma_fence_chain_enqueue(chain);
+}
+EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
+
 /**
  * dma_fence_chain_init - initialize a fence chain
  * @chain: the chain node to initialize
@@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
 
 	dma_fence_init(&chain->base, &dma_fence_chain_ops,
 		       &chain->lock, context, seqno);
+	dma_fence_chain_enqueue(chain);
 }
+
 EXPORT_SYMBOL(dma_fence_chain_init);
diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 5f45689a6d2e..b412b5396434 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -19,6 +19,7 @@
  * @base: fence base class
  * @lock: spinlock for fence handling
  * @prev: previous fence of the chain
+ * @next: next chain node for garbage collection
  * @prev_seqno: original previous seqno before garbage collection
  * @fence: encapsulated fence
  * @cb: callback structure for signaling
@@ -27,6 +28,7 @@
 struct dma_fence_chain {
 	struct dma_fence base;
 	struct dma_fence __rcu *prev;
+	struct dma_fence_chain __rcu *next;
 	u64 prev_seqno;
 	struct dma_fence *fence;
 	union {
@@ -38,6 +40,8 @@ struct dma_fence_chain {
 
 extern const struct dma_fence_ops dma_fence_chain_ops;
 
+void dma_fence_chain_garbage_collect(void);
+
 /**
  * to_dma_fence_chain - cast a fence to a dma_fence_chain
  * @fence: fence to cast to a dma_fence_array
@@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
  */
 static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
 {
+	dma_fence_chain_garbage_collect();
 	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
 };
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 5/7] drm/syncobj: drop the manual garbage collection
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Not needed any more since dma_fence_chain objects take care of this now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_syncobj.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 1c5b9ef6da37..e6d144775a87 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -304,9 +304,6 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
 	list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node)
 		syncobj_wait_syncobj_func(syncobj, cur);
 	spin_unlock(&syncobj->lock);
-
-	/* Walk the chain once to trigger garbage collection */
-	dma_fence_chain_for_each(fence, prev);
 	dma_fence_put(prev);
 }
 EXPORT_SYMBOL(drm_syncobj_add_point);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 5/7] drm/syncobj: drop the manual garbage collection
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Not needed any more since dma_fence_chain objects take care of this now.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_syncobj.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 1c5b9ef6da37..e6d144775a87 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -304,9 +304,6 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
 	list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node)
 		syncobj_wait_syncobj_func(syncobj, cur);
 	spin_unlock(&syncobj->lock);
-
-	/* Walk the chain once to trigger garbage collection */
-	dma_fence_chain_for_each(fence, prev);
 	dma_fence_put(prev);
 }
 EXPORT_SYMBOL(drm_syncobj_add_point);
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:17   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Unwrap a the explicit fence if it is a dma_fence_chain and
sync to the first fence not matching the owner rules.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
 1 file changed, 68 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 1b2ceccaf5b0..862eb3c1c4c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -28,6 +28,8 @@
  *    Christian König <christian.koenig@amd.com>
  */
 
+#include <linux/dma-fence-chain.h>
+
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
 	return amdgpu_sync_fence(sync, fence);
 }
 
+/* Determine based on the owner and mode if we should sync to a fence or not */
+static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
+				   enum amdgpu_sync_mode mode,
+				   void *owner, struct dma_fence *f)
+{
+	void *fence_owner = amdgpu_sync_get_owner(f);
+
+	/* Always sync to moves, no matter what */
+	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
+		return true;
+
+	/* We only want to trigger KFD eviction fences on
+	 * evict or move jobs. Skip KFD fences otherwise.
+	 */
+	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+		return false;
+
+	/* Never sync to VM updates either. */
+	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
+	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+		return false;
+
+	/* Ignore fences depending on the sync mode */
+	switch (mode) {
+	case AMDGPU_SYNC_ALWAYS:
+		return true;
+
+	case AMDGPU_SYNC_NE_OWNER:
+		if (amdgpu_sync_same_dev(adev, f) &&
+		    fence_owner == owner)
+			return false;
+		break;
+
+	case AMDGPU_SYNC_EQ_OWNER:
+		if (amdgpu_sync_same_dev(adev, f) &&
+		    fence_owner != owner)
+			return false;
+		break;
+
+	case AMDGPU_SYNC_EXPLICIT:
+		return false;
+	}
+
+	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
+	     "Adding eviction fence to sync obj");
+	return true;
+}
+
 /**
  * amdgpu_sync_resv - sync to a reservation object
  *
@@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
 
 	/* always sync to the exclusive fence */
 	f = dma_resv_excl_fence(resv);
-	r = amdgpu_sync_fence(sync, f);
+	dma_fence_chain_for_each(f, f) {
+		struct dma_fence_chain *chain = to_dma_fence_chain(f);
+
+		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
+					   chain->fence : f)) {
+			r = amdgpu_sync_fence(sync, f);
+			dma_fence_put(f);
+			if (r)
+				return r;
+			break;
+		}
+	}
 
 	flist = dma_resv_shared_list(resv);
-	if (!flist || r)
-		return r;
+	if (!flist)
+		return 0;
 
 	for (i = 0; i < flist->shared_count; ++i) {
-		void *fence_owner;
-
 		f = rcu_dereference_protected(flist->shared[i],
 					      dma_resv_held(resv));
 
-		fence_owner = amdgpu_sync_get_owner(f);
-
-		/* Always sync to moves, no matter what */
-		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
+		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
 			r = amdgpu_sync_fence(sync, f);
 			if (r)
-				break;
-		}
-
-		/* We only want to trigger KFD eviction fences on
-		 * evict or move jobs. Skip KFD fences otherwise.
-		 */
-		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
-		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-			continue;
-
-		/* Never sync to VM updates either. */
-		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
-		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-			continue;
-
-		/* Ignore fences depending on the sync mode */
-		switch (mode) {
-		case AMDGPU_SYNC_ALWAYS:
-			break;
-
-		case AMDGPU_SYNC_NE_OWNER:
-			if (amdgpu_sync_same_dev(adev, f) &&
-			    fence_owner == owner)
-				continue;
-			break;
-
-		case AMDGPU_SYNC_EQ_OWNER:
-			if (amdgpu_sync_same_dev(adev, f) &&
-			    fence_owner != owner)
-				continue;
-			break;
-
-		case AMDGPU_SYNC_EXPLICIT:
-			continue;
+				return r;
 		}
-
-		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
-		     "Adding eviction fence to sync obj");
-		r = amdgpu_sync_fence(sync, f);
-		if (r)
-			break;
 	}
-	return r;
+	return 0;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-10  9:17   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:17 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Unwrap a the explicit fence if it is a dma_fence_chain and
sync to the first fence not matching the owner rules.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
 1 file changed, 68 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 1b2ceccaf5b0..862eb3c1c4c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -28,6 +28,8 @@
  *    Christian König <christian.koenig@amd.com>
  */
 
+#include <linux/dma-fence-chain.h>
+
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
 	return amdgpu_sync_fence(sync, fence);
 }
 
+/* Determine based on the owner and mode if we should sync to a fence or not */
+static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
+				   enum amdgpu_sync_mode mode,
+				   void *owner, struct dma_fence *f)
+{
+	void *fence_owner = amdgpu_sync_get_owner(f);
+
+	/* Always sync to moves, no matter what */
+	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
+		return true;
+
+	/* We only want to trigger KFD eviction fences on
+	 * evict or move jobs. Skip KFD fences otherwise.
+	 */
+	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+		return false;
+
+	/* Never sync to VM updates either. */
+	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
+	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+		return false;
+
+	/* Ignore fences depending on the sync mode */
+	switch (mode) {
+	case AMDGPU_SYNC_ALWAYS:
+		return true;
+
+	case AMDGPU_SYNC_NE_OWNER:
+		if (amdgpu_sync_same_dev(adev, f) &&
+		    fence_owner == owner)
+			return false;
+		break;
+
+	case AMDGPU_SYNC_EQ_OWNER:
+		if (amdgpu_sync_same_dev(adev, f) &&
+		    fence_owner != owner)
+			return false;
+		break;
+
+	case AMDGPU_SYNC_EXPLICIT:
+		return false;
+	}
+
+	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
+	     "Adding eviction fence to sync obj");
+	return true;
+}
+
 /**
  * amdgpu_sync_resv - sync to a reservation object
  *
@@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
 
 	/* always sync to the exclusive fence */
 	f = dma_resv_excl_fence(resv);
-	r = amdgpu_sync_fence(sync, f);
+	dma_fence_chain_for_each(f, f) {
+		struct dma_fence_chain *chain = to_dma_fence_chain(f);
+
+		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
+					   chain->fence : f)) {
+			r = amdgpu_sync_fence(sync, f);
+			dma_fence_put(f);
+			if (r)
+				return r;
+			break;
+		}
+	}
 
 	flist = dma_resv_shared_list(resv);
-	if (!flist || r)
-		return r;
+	if (!flist)
+		return 0;
 
 	for (i = 0; i < flist->shared_count; ++i) {
-		void *fence_owner;
-
 		f = rcu_dereference_protected(flist->shared[i],
 					      dma_resv_held(resv));
 
-		fence_owner = amdgpu_sync_get_owner(f);
-
-		/* Always sync to moves, no matter what */
-		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
+		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
 			r = amdgpu_sync_fence(sync, f);
 			if (r)
-				break;
-		}
-
-		/* We only want to trigger KFD eviction fences on
-		 * evict or move jobs. Skip KFD fences otherwise.
-		 */
-		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
-		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-			continue;
-
-		/* Never sync to VM updates either. */
-		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
-		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-			continue;
-
-		/* Ignore fences depending on the sync mode */
-		switch (mode) {
-		case AMDGPU_SYNC_ALWAYS:
-			break;
-
-		case AMDGPU_SYNC_NE_OWNER:
-			if (amdgpu_sync_same_dev(adev, f) &&
-			    fence_owner == owner)
-				continue;
-			break;
-
-		case AMDGPU_SYNC_EQ_OWNER:
-			if (amdgpu_sync_same_dev(adev, f) &&
-			    fence_owner != owner)
-				continue;
-			break;
-
-		case AMDGPU_SYNC_EXPLICIT:
-			continue;
+				return r;
 		}
-
-		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
-		     "Adding eviction fence to sync obj");
-		r = amdgpu_sync_fence(sync, f);
-		if (r)
-			break;
 	}
-	return r;
+	return 0;
 }
 
 /**
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 7/7] drm/amdgpu: rework dma_resv handling
  2021-06-10  9:17 ` Christian König
@ 2021-06-10  9:18   ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:18 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Drop the workaround and instead implement a better solution.

Basically we are now chaining all submissions using a dma_fence_chain
container and adding them as exclusive fence to the dma_resv object.

This way other drivers can still sync to the single exclusive fence
while amdgpu only sync to fences from different processes.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
 6 files changed, 47 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index a130e766cbdb..c905a4cfc173 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -34,6 +34,7 @@ struct amdgpu_fpriv;
 struct amdgpu_bo_list_entry {
 	struct ttm_validate_buffer	tv;
 	struct amdgpu_bo_va		*bo_va;
+	struct dma_fence_chain		*chain;
 	uint32_t			priority;
 	struct page			**user_pages;
 	bool				user_invalidated;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 325e82621467..f6f3029f958d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		goto out;
 	}
 
+	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+
+		e->bo_va = amdgpu_vm_bo_find(vm, bo);
+
+		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
+			e->chain = dma_fence_chain_alloc();
+			if (!e->chain) {
+				r = -ENOMEM;
+				goto error_validate;
+			}
+		}
+	}
+
 	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
 					  &p->bytes_moved_vis_threshold);
 	p->bytes_moved = 0;
@@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	gws = p->bo_list->gws_obj;
 	oa = p->bo_list->oa_obj;
 
-	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-		/* Make sure we use the exclusive slot for shared BOs */
-		if (bo->prime_shared_count)
-			e->tv.num_shared = 0;
-		e->bo_va = amdgpu_vm_bo_find(vm, bo);
-	}
-
 	if (gds) {
 		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
 		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
@@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	}
 
 error_validate:
-	if (r)
+	if (r) {
+		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+			dma_fence_chain_free(e->chain);
+			e->chain = NULL;
+		}
 		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
+	}
 out:
 	return r;
 }
@@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
 {
 	unsigned i;
 
-	if (error && backoff)
+	if (error && backoff) {
+		struct amdgpu_bo_list_entry *e;
+
+		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
+			dma_fence_chain_free(e->chain);
+			e->chain = NULL;
+		}
+
 		ttm_eu_backoff_reservation(&parser->ticket,
 					   &parser->validated);
+	}
 
 	for (i = 0; i < parser->num_post_deps; i++) {
 		drm_syncobj_put(parser->post_deps[i].syncobj);
@@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
+	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+		struct dma_resv *resv = e->tv.bo->base.resv;
+		struct dma_fence_chain *chain = e->chain;
+
+		if (!chain)
+			continue;
+
+		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
+				     dma_fence_get(p->fence), 1);
+
+		rcu_assign_pointer(resv->fence_excl, &chain->base);
+		e->chain = NULL;
+	}
+
 	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
 	mutex_unlock(&p->adev->notifier_lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index c3053b83b80c..23219fc3b28c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -42,48 +42,6 @@
 #include <linux/pci-p2pdma.h>
 #include <linux/pm_runtime.h>
 
-static int
-__dma_resv_make_exclusive(struct dma_resv *obj)
-{
-	struct dma_fence **fences;
-	unsigned int count;
-	int r;
-
-	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
-		return 0;
-
-	r = dma_resv_get_fences(obj, NULL, &count, &fences);
-	if (r)
-		return r;
-
-	if (count == 0) {
-		/* Now that was unexpected. */
-	} else if (count == 1) {
-		dma_resv_add_excl_fence(obj, fences[0]);
-		dma_fence_put(fences[0]);
-		kfree(fences);
-	} else {
-		struct dma_fence_array *array;
-
-		array = dma_fence_array_create(count, fences,
-					       dma_fence_context_alloc(1), 0,
-					       false);
-		if (!array)
-			goto err_fences_put;
-
-		dma_resv_add_excl_fence(obj, &array->base);
-		dma_fence_put(&array->base);
-	}
-
-	return 0;
-
-err_fences_put:
-	while (count--)
-		dma_fence_put(fences[count]);
-	kfree(fences);
-	return -ENOMEM;
-}
-
 /**
  * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
  *
@@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
 	if (r < 0)
 		goto out;
 
-	r = amdgpu_bo_reserve(bo, false);
-	if (unlikely(r != 0))
-		goto out;
-
-	/*
-	 * We only create shared fences for internal use, but importers
-	 * of the dmabuf rely on exclusive fences for implicitly
-	 * tracking write hazards. As any of the current fences may
-	 * correspond to a write, we need to convert all existing
-	 * fences on the reservation object into a single exclusive
-	 * fence.
-	 */
-	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
-	if (r)
-		goto out;
-
-	bo->prime_shared_count++;
-	amdgpu_bo_unreserve(bo);
 	return 0;
 
 out:
@@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
-	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
-		bo->prime_shared_count--;
-
 	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
 	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
 }
@@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
 	bo = gem_to_amdgpu_bo(gobj);
 	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
 	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
-	if (dma_buf->ops != &amdgpu_dmabuf_ops)
-		bo->prime_shared_count = 1;
 
 	dma_resv_unlock(resv);
 	return gobj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 1c3e3b608332..5d82120fc160 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 	case AMDGPU_GEM_OP_SET_PLACEMENT:
-		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
+		if (robj->tbo.base.import_attach &&
+		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
 			r = -EINVAL;
 			amdgpu_bo_unreserve(robj);
 			break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 96447e1d4c9c..30951b593809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
 		return -EINVAL;
 
 	/* A shared bo cannot be migrated to VRAM */
-	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
+	if (bo->tbo.base.import_attach) {
 		if (domain & AMDGPU_GEM_DOMAIN_GTT)
 			domain = AMDGPU_GEM_DOMAIN_GTT;
 		else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index b35962702278..55d7e90eaa72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -98,7 +98,6 @@ struct amdgpu_bo {
 	struct ttm_buffer_object	tbo;
 	struct ttm_bo_kmap_obj		kmap;
 	u64				flags;
-	unsigned			prime_shared_count;
 	/* per VM structure for page tables and with virtual addresses */
 	struct amdgpu_vm_bo_base	*vm_bo;
 	/* Constant after initialization */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 7/7] drm/amdgpu: rework dma_resv handling
@ 2021-06-10  9:18   ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10  9:18 UTC (permalink / raw)
  To: daniel, dri-devel, amd-gfx

Drop the workaround and instead implement a better solution.

Basically we are now chaining all submissions using a dma_fence_chain
container and adding them as exclusive fence to the dma_resv object.

This way other drivers can still sync to the single exclusive fence
while amdgpu only sync to fences from different processes.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
 6 files changed, 47 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index a130e766cbdb..c905a4cfc173 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -34,6 +34,7 @@ struct amdgpu_fpriv;
 struct amdgpu_bo_list_entry {
 	struct ttm_validate_buffer	tv;
 	struct amdgpu_bo_va		*bo_va;
+	struct dma_fence_chain		*chain;
 	uint32_t			priority;
 	struct page			**user_pages;
 	bool				user_invalidated;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 325e82621467..f6f3029f958d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		goto out;
 	}
 
+	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+
+		e->bo_va = amdgpu_vm_bo_find(vm, bo);
+
+		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
+			e->chain = dma_fence_chain_alloc();
+			if (!e->chain) {
+				r = -ENOMEM;
+				goto error_validate;
+			}
+		}
+	}
+
 	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
 					  &p->bytes_moved_vis_threshold);
 	p->bytes_moved = 0;
@@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	gws = p->bo_list->gws_obj;
 	oa = p->bo_list->oa_obj;
 
-	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-		/* Make sure we use the exclusive slot for shared BOs */
-		if (bo->prime_shared_count)
-			e->tv.num_shared = 0;
-		e->bo_va = amdgpu_vm_bo_find(vm, bo);
-	}
-
 	if (gds) {
 		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
 		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
@@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	}
 
 error_validate:
-	if (r)
+	if (r) {
+		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+			dma_fence_chain_free(e->chain);
+			e->chain = NULL;
+		}
 		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
+	}
 out:
 	return r;
 }
@@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
 {
 	unsigned i;
 
-	if (error && backoff)
+	if (error && backoff) {
+		struct amdgpu_bo_list_entry *e;
+
+		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
+			dma_fence_chain_free(e->chain);
+			e->chain = NULL;
+		}
+
 		ttm_eu_backoff_reservation(&parser->ticket,
 					   &parser->validated);
+	}
 
 	for (i = 0; i < parser->num_post_deps; i++) {
 		drm_syncobj_put(parser->post_deps[i].syncobj);
@@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
+	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+		struct dma_resv *resv = e->tv.bo->base.resv;
+		struct dma_fence_chain *chain = e->chain;
+
+		if (!chain)
+			continue;
+
+		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
+				     dma_fence_get(p->fence), 1);
+
+		rcu_assign_pointer(resv->fence_excl, &chain->base);
+		e->chain = NULL;
+	}
+
 	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
 	mutex_unlock(&p->adev->notifier_lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index c3053b83b80c..23219fc3b28c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -42,48 +42,6 @@
 #include <linux/pci-p2pdma.h>
 #include <linux/pm_runtime.h>
 
-static int
-__dma_resv_make_exclusive(struct dma_resv *obj)
-{
-	struct dma_fence **fences;
-	unsigned int count;
-	int r;
-
-	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
-		return 0;
-
-	r = dma_resv_get_fences(obj, NULL, &count, &fences);
-	if (r)
-		return r;
-
-	if (count == 0) {
-		/* Now that was unexpected. */
-	} else if (count == 1) {
-		dma_resv_add_excl_fence(obj, fences[0]);
-		dma_fence_put(fences[0]);
-		kfree(fences);
-	} else {
-		struct dma_fence_array *array;
-
-		array = dma_fence_array_create(count, fences,
-					       dma_fence_context_alloc(1), 0,
-					       false);
-		if (!array)
-			goto err_fences_put;
-
-		dma_resv_add_excl_fence(obj, &array->base);
-		dma_fence_put(&array->base);
-	}
-
-	return 0;
-
-err_fences_put:
-	while (count--)
-		dma_fence_put(fences[count]);
-	kfree(fences);
-	return -ENOMEM;
-}
-
 /**
  * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
  *
@@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
 	if (r < 0)
 		goto out;
 
-	r = amdgpu_bo_reserve(bo, false);
-	if (unlikely(r != 0))
-		goto out;
-
-	/*
-	 * We only create shared fences for internal use, but importers
-	 * of the dmabuf rely on exclusive fences for implicitly
-	 * tracking write hazards. As any of the current fences may
-	 * correspond to a write, we need to convert all existing
-	 * fences on the reservation object into a single exclusive
-	 * fence.
-	 */
-	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
-	if (r)
-		goto out;
-
-	bo->prime_shared_count++;
-	amdgpu_bo_unreserve(bo);
 	return 0;
 
 out:
@@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
-	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
-		bo->prime_shared_count--;
-
 	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
 	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
 }
@@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
 	bo = gem_to_amdgpu_bo(gobj);
 	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
 	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
-	if (dma_buf->ops != &amdgpu_dmabuf_ops)
-		bo->prime_shared_count = 1;
 
 	dma_resv_unlock(resv);
 	return gobj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 1c3e3b608332..5d82120fc160 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 	case AMDGPU_GEM_OP_SET_PLACEMENT:
-		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
+		if (robj->tbo.base.import_attach &&
+		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
 			r = -EINVAL;
 			amdgpu_bo_unreserve(robj);
 			break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 96447e1d4c9c..30951b593809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
 		return -EINVAL;
 
 	/* A shared bo cannot be migrated to VRAM */
-	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
+	if (bo->tbo.base.import_attach) {
 		if (domain & AMDGPU_GEM_DOMAIN_GTT)
 			domain = AMDGPU_GEM_DOMAIN_GTT;
 		else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index b35962702278..55d7e90eaa72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -98,7 +98,6 @@ struct amdgpu_bo {
 	struct ttm_buffer_object	tbo;
 	struct ttm_bo_kmap_obj		kmap;
 	u64				flags;
-	unsigned			prime_shared_count;
 	/* per VM structure for page tables and with virtual addresses */
 	struct amdgpu_vm_bo_base	*vm_bo;
 	/* Constant after initialization */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Change how amdgpu stores fences in dma_resv objects
  2021-06-10  9:17 ` Christian König
@ 2021-06-10 16:34   ` Michel Dänzer
  -1 siblings, 0 replies; 58+ messages in thread
From: Michel Dänzer @ 2021-06-10 16:34 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On 2021-06-10 11:17 a.m., Christian König wrote:
> Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.
> 
> This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.
> 
> For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.
> 
> During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.
> 
> This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.

Nice!

This seems to work as expected with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now don't poll readable at first, until the GPU is done processing them.


Unfortunately, as expected, without a high priority context for the compositor which can preempt client drawing, this isn't enough to prevent slow clients from slowing down the compositor as well. But it should already help for fullscreen apps where the compositor can directly scan out the client buffers at least.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Change how amdgpu stores fences in dma_resv objects
@ 2021-06-10 16:34   ` Michel Dänzer
  0 siblings, 0 replies; 58+ messages in thread
From: Michel Dänzer @ 2021-06-10 16:34 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On 2021-06-10 11:17 a.m., Christian König wrote:
> Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.
> 
> This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.
> 
> For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.
> 
> During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.
> 
> This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.

Nice!

This seems to work as expected with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now don't poll readable at first, until the GPU is done processing them.


Unfortunately, as expected, without a high priority context for the compositor which can preempt client drawing, this isn't enough to prevent slow clients from slowing down the compositor as well. But it should already help for fullscreen apps where the compositor can directly scan out the client buffers at least.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Change how amdgpu stores fences in dma_resv objects
  2021-06-10 16:34   ` Michel Dänzer
@ 2021-06-10 16:44     ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10 16:44 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx, dri-devel

Am 10.06.21 um 18:34 schrieb Michel Dänzer:
> On 2021-06-10 11:17 a.m., Christian König wrote:
>> Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.
>>
>> This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.
>>
>> For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.
>>
>> During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.
>>
>> This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.
> Nice!
>
> This seems to work as expected with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now don't poll readable at first, until the GPU is done processing them.

Well I'm still pretty sure that any polling on the CPU should be 
avoided, but yes it is nice to have that working now in general.

> Unfortunately, as expected, without a high priority context for the compositor which can preempt client drawing, this isn't enough to prevent slow clients from slowing down the compositor as well. But it should already help for fullscreen apps where the compositor can directly scan out the client buffers at least.

I have seen patches for this flying by internally, but not sure about 
the status.

Christian.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Change how amdgpu stores fences in dma_resv objects
@ 2021-06-10 16:44     ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-10 16:44 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx, dri-devel, daniel

Am 10.06.21 um 18:34 schrieb Michel Dänzer:
> On 2021-06-10 11:17 a.m., Christian König wrote:
>> Since we can't find a consensus on hot to move forward with the dma_resv object I concentrated on changing the approach for amdgpu first.
>>
>> This new approach changes how the driver stores the command submission fence in the dma_resv object in DMA-buf exported BOs.
>>
>> For exported BOs we now store the CS fence in a dma_fence_chain container and assign that one to the exclusive fences slot.
>>
>> During synchronization this dma_fence_chain container is unpacked again and the containing fences handled individually.
>>
>> This has a little bit more overhead than the old approach, but it allows for waiting for the exclusive slot for writes again.
> Nice!
>
> This seems to work as expected with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now don't poll readable at first, until the GPU is done processing them.

Well I'm still pretty sure that any polling on the CPU should be 
avoided, but yes it is nice to have that working now in general.

> Unfortunately, as expected, without a high priority context for the compositor which can preempt client drawing, this isn't enough to prevent slow clients from slowing down the compositor as well. But it should already help for fullscreen apps where the compositor can directly scan out the client buffers at least.

I have seen patches for this flying by internally, but not sure about 
the status.

Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/7] dma-buf: some dma_fence_chain improvements
  2021-06-10  9:17   ` Christian König
@ 2021-06-11  7:49     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:49 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:17:54AM +0200, Christian König wrote:
> The callback and the irq work are never used at the same
> time. Putting them into an union saves us 24 bytes and
> makes the structure only 120 bytes in size.

Yeah pushing below 128 bytes makes sense.

> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence-chain.c | 2 +-
>  include/linux/dma-fence-chain.h   | 8 +++++---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index 7d129e68ac70..1b4cb3e5cec9 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -137,6 +137,7 @@ static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>  	struct dma_fence_chain *chain;
>  
>  	chain = container_of(cb, typeof(*chain), cb);
> +	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>  	irq_work_queue(&chain->work);
>  	dma_fence_put(f);
>  }
> @@ -239,7 +240,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  	rcu_assign_pointer(chain->prev, prev);
>  	chain->fence = fence;
>  	chain->prev_seqno = 0;
> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>  
>  	/* Try to reuse the context of the previous chain node. */
>  	if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 10462a029da2..9d6a062be640 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -25,12 +25,14 @@
>   */
>  struct dma_fence_chain {
>  	struct dma_fence base;
> -	spinlock_t lock;
>  	struct dma_fence __rcu *prev;
>  	u64 prev_seqno;
>  	struct dma_fence *fence;
> -	struct dma_fence_cb cb;
> -	struct irq_work work;

Can you pls pull the kerneldoc inline here for these too and extend the
comments that @work is only used from the callback, at which point we
don't need @cb anymore? For union lifetime tricks we really should
document this in the datastructure docs.

With that:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I also think it'd be good to specify this clearly in the kerneldoc for
dma_fence_add_callback() with something like:

"Note that the registered @cb structure is no longer in use by the
signalling code by the time @func is called, and can therefore be used
again. This is useful when @func launches a work item because it allows us
to put both the struct dma_fence_cb and the work struct (e.g. struct
work_struct) into a union to save space."

Feel free to includ this in this patch here or do a separate one.

Cheers, Daniel

> +	union {
> +		struct dma_fence_cb cb;
> +		struct irq_work work;
> +	};
> +	spinlock_t lock;
>  };
>  
>  extern const struct dma_fence_ops dma_fence_chain_ops;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/7] dma-buf: some dma_fence_chain improvements
@ 2021-06-11  7:49     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:49 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:17:54AM +0200, Christian König wrote:
> The callback and the irq work are never used at the same
> time. Putting them into an union saves us 24 bytes and
> makes the structure only 120 bytes in size.

Yeah pushing below 128 bytes makes sense.

> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-fence-chain.c | 2 +-
>  include/linux/dma-fence-chain.h   | 8 +++++---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index 7d129e68ac70..1b4cb3e5cec9 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -137,6 +137,7 @@ static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>  	struct dma_fence_chain *chain;
>  
>  	chain = container_of(cb, typeof(*chain), cb);
> +	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>  	irq_work_queue(&chain->work);
>  	dma_fence_put(f);
>  }
> @@ -239,7 +240,6 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  	rcu_assign_pointer(chain->prev, prev);
>  	chain->fence = fence;
>  	chain->prev_seqno = 0;
> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>  
>  	/* Try to reuse the context of the previous chain node. */
>  	if (prev_chain && __dma_fence_is_later(seqno, prev->seqno, prev->ops)) {
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 10462a029da2..9d6a062be640 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -25,12 +25,14 @@
>   */
>  struct dma_fence_chain {
>  	struct dma_fence base;
> -	spinlock_t lock;
>  	struct dma_fence __rcu *prev;
>  	u64 prev_seqno;
>  	struct dma_fence *fence;
> -	struct dma_fence_cb cb;
> -	struct irq_work work;

Can you pls pull the kerneldoc inline here for these too and extend the
comments that @work is only used from the callback, at which point we
don't need @cb anymore? For union lifetime tricks we really should
document this in the datastructure docs.

With that:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I also think it'd be good to specify this clearly in the kerneldoc for
dma_fence_add_callback() with something like:

"Note that the registered @cb structure is no longer in use by the
signalling code by the time @func is called, and can therefore be used
again. This is useful when @func launches a work item because it allows us
to put both the struct dma_fence_cb and the work struct (e.g. struct
work_struct) into a union to save space."

Feel free to includ this in this patch here or do a separate one.

Cheers, Daniel

> +	union {
> +		struct dma_fence_cb cb;
> +		struct irq_work work;
> +	};
> +	spinlock_t lock;
>  };
>  
>  extern const struct dma_fence_ops dma_fence_chain_ops;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
  2021-06-10  9:17   ` Christian König
@ 2021-06-11  7:54     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:54 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
> and some unused code in the selftest.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
>  drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
>  drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
>  include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
>  6 files changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 9525f7f56119..8ce1ea59d31b 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
>  	return &f->base;
>  }
>  
> -static inline struct mock_chain {
> -	struct dma_fence_chain base;
> -} *to_mock_chain(struct dma_fence *f) {
> -	return container_of(f, struct mock_chain, base.base);
> -}
> -
>  static struct dma_fence *mock_chain(struct dma_fence *prev,
>  				    struct dma_fence *fence,
>  				    u64 seqno)
>  {
> -	struct mock_chain *f;
> +	struct dma_fence_chain *f;
>  
> -	f = kmalloc(sizeof(*f), GFP_KERNEL);
> +	f = dma_fence_chain_alloc();
>  	if (!f)
>  		return NULL;
>  
> -	dma_fence_chain_init(&f->base,
> -			     dma_fence_get(prev),
> -			     dma_fence_get(fence),
> +	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
>  			     seqno);
>  
> -	return &f->base.base;
> +	return &f->base;
>  }
>  
>  static int sanitycheck(void *arg)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 90136f9dedd6..325e82621467 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>  
>  		dep->chain = NULL;
>  		if (syncobj_deps[i].point) {
> -			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
> +			dep->chain = dma_fence_chain_alloc();
>  			if (!dep->chain)
>  				return -ENOMEM;
>  		}
> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>  		dep->syncobj = drm_syncobj_find(p->filp,
>  						syncobj_deps[i].handle);
>  		if (!dep->syncobj) {
> -			kfree(dep->chain);
> +			dma_fence_chain_free(dep->chain);
>  			return -EINVAL;
>  		}
>  		dep->point = syncobj_deps[i].point;
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index fdd2ec87cdd1..1c5b9ef6da37 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>  				     &fence);
>  	if (ret)
>  		goto err;
> -	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +	chain = dma_fence_chain_alloc();
>  	if (!chain) {
>  		ret = -ENOMEM;
>  		goto err1;
> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>  		goto err_points;
>  	}
>  	for (i = 0; i < args->count_handles; i++) {
> -		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +		chains[i] = dma_fence_chain_alloc();
>  		if (!chains[i]) {
>  			for (j = 0; j < i; j++)
> -				kfree(chains[j]);
> +				dma_fence_chain_free(chains[j]);
>  			ret = -ENOMEM;
>  			goto err_chains;
>  		}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 66789111a24b..a22cb86730b3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
>  	while (n--) {
>  		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>  		dma_fence_put(fences[n].dma_fence);
> -		kfree(fences[n].chain_fence);
> +		dma_fence_chain_free(fences[n].chain_fence);
>  	}
>  	kvfree(fences);
>  }
> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>  				return -EINVAL;
>  			}
>  
> -			f->chain_fence =
> -				kmalloc(sizeof(*f->chain_fence),
> -					GFP_KERNEL);
> +			f->chain_fence = dma_fence_chain_alloc();
>  			if (!f->chain_fence) {
>  				drm_syncobj_put(syncobj);
>  				dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 5480852bdeda..6ff6df6c4791 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>  				break;
>  			}
>  
> -			post_deps[i].chain =
> -				kmalloc(sizeof(*post_deps[i].chain),
> -				        GFP_KERNEL);
> +			post_deps[i].chain = dma_fence_chain_alloc();
>  			if (!post_deps[i].chain) {
>  				ret = -ENOMEM;
>  				break;
> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>  
>  	if (ret) {
>  		for (j = 0; j <= i; ++j) {
> -			kfree(post_deps[j].chain);
> +			dma_fence_chain_free(post_deps[j].chain);
>  			if (post_deps[j].syncobj)
>  				drm_syncobj_put(post_deps[j].syncobj);
>  		}
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 9d6a062be640..5f45689a6d2e 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -12,6 +12,7 @@
>  
>  #include <linux/dma-fence.h>
>  #include <linux/irq_work.h>
> +#include <linux/slab.h>
>  
>  /**
>   * struct dma_fence_chain - fence to represent an node of a fence chain
> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
>  	return container_of(fence, struct dma_fence_chain, base);
>  }
>  
> +/**
> + * dma_fence_chain_alloc
> + *
> + * Returns a new dma_fence_chain object

... or NULL on failure.

> + */
> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> +{
> +	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +};
> +
> +/**
> + * dma_fence_chain_free
> + * @chain: chain node to free
> + *
> + * Frees up an allocated but not used dma_fence_chain node.
> + */
> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
> +{
> +	kfree(chain);

kfree_rcu, and I guess this means this patch here should be cc: stable.

This is kinda why I'm questioning whether this "dma_fence are protected by
rcu" cross driver api is really a good idea. We largely get it wrong in
the details in a _ton_ of places.

With the details fixed:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +
>  /**
>   * dma_fence_chain_for_each - iterate over all fences in chain
>   * @iter: current fence
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
@ 2021-06-11  7:54     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:54 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
> and some unused code in the selftest.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
>  drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
>  drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
>  include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
>  6 files changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 9525f7f56119..8ce1ea59d31b 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
>  	return &f->base;
>  }
>  
> -static inline struct mock_chain {
> -	struct dma_fence_chain base;
> -} *to_mock_chain(struct dma_fence *f) {
> -	return container_of(f, struct mock_chain, base.base);
> -}
> -
>  static struct dma_fence *mock_chain(struct dma_fence *prev,
>  				    struct dma_fence *fence,
>  				    u64 seqno)
>  {
> -	struct mock_chain *f;
> +	struct dma_fence_chain *f;
>  
> -	f = kmalloc(sizeof(*f), GFP_KERNEL);
> +	f = dma_fence_chain_alloc();
>  	if (!f)
>  		return NULL;
>  
> -	dma_fence_chain_init(&f->base,
> -			     dma_fence_get(prev),
> -			     dma_fence_get(fence),
> +	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
>  			     seqno);
>  
> -	return &f->base.base;
> +	return &f->base;
>  }
>  
>  static int sanitycheck(void *arg)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 90136f9dedd6..325e82621467 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>  
>  		dep->chain = NULL;
>  		if (syncobj_deps[i].point) {
> -			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
> +			dep->chain = dma_fence_chain_alloc();
>  			if (!dep->chain)
>  				return -ENOMEM;
>  		}
> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>  		dep->syncobj = drm_syncobj_find(p->filp,
>  						syncobj_deps[i].handle);
>  		if (!dep->syncobj) {
> -			kfree(dep->chain);
> +			dma_fence_chain_free(dep->chain);
>  			return -EINVAL;
>  		}
>  		dep->point = syncobj_deps[i].point;
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index fdd2ec87cdd1..1c5b9ef6da37 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>  				     &fence);
>  	if (ret)
>  		goto err;
> -	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +	chain = dma_fence_chain_alloc();
>  	if (!chain) {
>  		ret = -ENOMEM;
>  		goto err1;
> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>  		goto err_points;
>  	}
>  	for (i = 0; i < args->count_handles; i++) {
> -		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +		chains[i] = dma_fence_chain_alloc();
>  		if (!chains[i]) {
>  			for (j = 0; j < i; j++)
> -				kfree(chains[j]);
> +				dma_fence_chain_free(chains[j]);
>  			ret = -ENOMEM;
>  			goto err_chains;
>  		}
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 66789111a24b..a22cb86730b3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
>  	while (n--) {
>  		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>  		dma_fence_put(fences[n].dma_fence);
> -		kfree(fences[n].chain_fence);
> +		dma_fence_chain_free(fences[n].chain_fence);
>  	}
>  	kvfree(fences);
>  }
> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>  				return -EINVAL;
>  			}
>  
> -			f->chain_fence =
> -				kmalloc(sizeof(*f->chain_fence),
> -					GFP_KERNEL);
> +			f->chain_fence = dma_fence_chain_alloc();
>  			if (!f->chain_fence) {
>  				drm_syncobj_put(syncobj);
>  				dma_fence_put(fence);
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 5480852bdeda..6ff6df6c4791 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>  				break;
>  			}
>  
> -			post_deps[i].chain =
> -				kmalloc(sizeof(*post_deps[i].chain),
> -				        GFP_KERNEL);
> +			post_deps[i].chain = dma_fence_chain_alloc();
>  			if (!post_deps[i].chain) {
>  				ret = -ENOMEM;
>  				break;
> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>  
>  	if (ret) {
>  		for (j = 0; j <= i; ++j) {
> -			kfree(post_deps[j].chain);
> +			dma_fence_chain_free(post_deps[j].chain);
>  			if (post_deps[j].syncobj)
>  				drm_syncobj_put(post_deps[j].syncobj);
>  		}
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 9d6a062be640..5f45689a6d2e 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -12,6 +12,7 @@
>  
>  #include <linux/dma-fence.h>
>  #include <linux/irq_work.h>
> +#include <linux/slab.h>
>  
>  /**
>   * struct dma_fence_chain - fence to represent an node of a fence chain
> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
>  	return container_of(fence, struct dma_fence_chain, base);
>  }
>  
> +/**
> + * dma_fence_chain_alloc
> + *
> + * Returns a new dma_fence_chain object

... or NULL on failure.

> + */
> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> +{
> +	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> +};
> +
> +/**
> + * dma_fence_chain_free
> + * @chain: chain node to free
> + *
> + * Frees up an allocated but not used dma_fence_chain node.
> + */
> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
> +{
> +	kfree(chain);

kfree_rcu, and I guess this means this patch here should be cc: stable.

This is kinda why I'm questioning whether this "dma_fence are protected by
rcu" cross driver api is really a good idea. We largely get it wrong in
the details in a _ton_ of places.

With the details fixed:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +
>  /**
>   * dma_fence_chain_for_each - iterate over all fences in chain
>   * @iter: current fence
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
  2021-06-10  9:17   ` Christian König
@ 2021-06-11  7:58     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:58 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:17:56AM +0200, Christian König wrote:
> Exercise the newly added functions.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

I have honestly no idea what this checks. Spawning a few threads to
validate kmalloc/kfree feels a bit silly. Now testing whether we correctly
rcu-delay the freeing here would make some sense, but even that feels a
bit silly.

I guess if you want this explain with comments what it does and why?
-Daniel


> ---
>  drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 8ce1ea59d31b..855c129c6093 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
>  	return err;
>  }
>  
> +static int __alloc_free(void *arg)
> +{
> +	atomic_t *counter = arg;
> +	int i, j;
> +
> +	for (i = 0; i < 1024; ++i) {
> +		struct dma_fence_chain *chains[64];
> +
> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
> +			chains[j] = dma_fence_chain_alloc();
> +
> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
> +			dma_fence_chain_free(chains[j]);
> +
> +		atomic_add(ARRAY_SIZE(chains), counter);
> +	}
> +	return 0;
> +}
> +
> +static int alloc_free(void *arg)
> +{
> +	struct task_struct *threads[8];
> +	atomic_t counter = ATOMIC_INIT(0);
> +	int i, err = 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(threads); i++) {
> +		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
> +					 i);
> +		if (IS_ERR(threads[i])) {
> +			err = PTR_ERR(threads[i]);
> +			break;
> +		}
> +	}
> +
> +	while (i--) {
> +		int ret;
> +
> +		ret = kthread_stop(threads[i]);
> +		if (ret && !err)
> +			err = ret;
> +	}
> +
> +	pr_info("Completed %u cycles\n", atomic_read(&counter));
> +
> +	return err;
> +}
> +
>  struct fence_chains {
>  	unsigned int chain_length;
>  	struct dma_fence **fences;
> @@ -677,6 +724,7 @@ int dma_fence_chain(void)
>  {
>  	static const struct subtest tests[] = {
>  		SUBTEST(sanitycheck),
> +		SUBTEST(alloc_free),
>  		SUBTEST(find_seqno),
>  		SUBTEST(find_signaled),
>  		SUBTEST(find_out_of_order),
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
@ 2021-06-11  7:58     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  7:58 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:17:56AM +0200, Christian König wrote:
> Exercise the newly added functions.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

I have honestly no idea what this checks. Spawning a few threads to
validate kmalloc/kfree feels a bit silly. Now testing whether we correctly
rcu-delay the freeing here would make some sense, but even that feels a
bit silly.

I guess if you want this explain with comments what it does and why?
-Daniel


> ---
>  drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 8ce1ea59d31b..855c129c6093 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
>  	return err;
>  }
>  
> +static int __alloc_free(void *arg)
> +{
> +	atomic_t *counter = arg;
> +	int i, j;
> +
> +	for (i = 0; i < 1024; ++i) {
> +		struct dma_fence_chain *chains[64];
> +
> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
> +			chains[j] = dma_fence_chain_alloc();
> +
> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
> +			dma_fence_chain_free(chains[j]);
> +
> +		atomic_add(ARRAY_SIZE(chains), counter);
> +	}
> +	return 0;
> +}
> +
> +static int alloc_free(void *arg)
> +{
> +	struct task_struct *threads[8];
> +	atomic_t counter = ATOMIC_INIT(0);
> +	int i, err = 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(threads); i++) {
> +		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
> +					 i);
> +		if (IS_ERR(threads[i])) {
> +			err = PTR_ERR(threads[i]);
> +			break;
> +		}
> +	}
> +
> +	while (i--) {
> +		int ret;
> +
> +		ret = kthread_stop(threads[i]);
> +		if (ret && !err)
> +			err = ret;
> +	}
> +
> +	pr_info("Completed %u cycles\n", atomic_read(&counter));
> +
> +	return err;
> +}
> +
>  struct fence_chains {
>  	unsigned int chain_length;
>  	struct dma_fence **fences;
> @@ -677,6 +724,7 @@ int dma_fence_chain(void)
>  {
>  	static const struct subtest tests[] = {
>  		SUBTEST(sanitycheck),
> +		SUBTEST(alloc_free),
>  		SUBTEST(find_seqno),
>  		SUBTEST(find_signaled),
>  		SUBTEST(find_out_of_order),
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-10  9:17   ` Christian König
@ 2021-06-11  8:58     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  8:58 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
> Add some rather sophisticated lockless garbage collection
> for dma_fence_chain objects.
> 
> For this keep all initialized dma_fence_chain nodes an a
> queue and trigger garbage collection before a new one is
> allocated.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Uh hand-rolled lockless list, I'm not a fan.

But the real question here is why? This is a global list, so it's going to
look great on your desktop, but gpus are for servers now and those are
NUMA. So just from that pov doing garbage-collection individually still
feels like a much better idea.

So what's the problem your trying to solve here?
-Daniel

> ---
>  drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
>  include/linux/dma-fence-chain.h   |   5 +
>  2 files changed, 142 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index 1b4cb3e5cec9..c2f0b69eabb7 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -9,8 +9,53 @@
>  
>  #include <linux/dma-fence-chain.h>
>  
> +static struct dma_fence_chain __rcu *fifo_front;
> +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
> +
>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>  
> +/**
> + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
> + * @chain: the chain node to enqueue
> + *
> + * Add the chain node to the end of the gc fifo.
> + */
> +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
> +{
> +	struct dma_fence_chain __rcu **tmp;
> +
> +	RCU_INIT_POINTER(chain->next, NULL);
> +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
> +		   &chain->next);
> +
> +	/* This is intentionally unordered since we only need the fifo for gc */
> +	rcu_assign_pointer(*tmp, chain);
> +}
> +
> +/**
> + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
> + *
> + * Remove the first chain node from the gc fifo while making sure that always
> + * keep at least one node on the fifo for lockless fifo implementation.
> + * Returns the dequeued chain node or NULL.
> + */
> +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
> +{
> +	struct dma_fence_chain *chain, *tmp;
> +
> +	rcu_read_lock();
> +	chain = rcu_dereference(fifo_front);
> +	/* Never dequeue the last chain node for lockless fifo */
> +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
> +		rcu_read_unlock();
> +		return NULL;
> +	}
> +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
> +		      chain, rcu_access_pointer(chain->next));
> +	rcu_read_unlock();
> +	return tmp == chain ? chain : NULL;
> +}
> +
>  /**
>   * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
>   * @chain: chain node to get the previous node from
> @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>  	return prev;
>  }
>  
> +/**
> + * dma_fence_chain_try_replace - try to replace the prev node
> + * @chain: Chain node we try to replace prev.
> + * @prev: the old prev node
> + *
> + * Try to replace the previous chain node when it or its containing fence is
> + * signaled. Returns true if we tried, false if we need to wait.
> + */
> +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
> +					struct dma_fence *prev)
> +{
> +	struct dma_fence *replacement, *tmp;
> +	struct dma_fence_chain *prev_chain;
> +
> +	prev_chain = to_dma_fence_chain(prev);
> +	if (prev_chain) {
> +		if (!dma_fence_is_signaled(prev_chain->fence))
> +			return false;
> +
> +		replacement = dma_fence_chain_get_prev(prev_chain);
> +	} else {
> +		if (!dma_fence_is_signaled(prev))
> +			return false;
> +
> +		replacement = NULL;
> +	}
> +
> +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
> +		      replacement);
> +	if (tmp == prev)
> +		dma_fence_put(tmp);
> +	else
> +		dma_fence_put(replacement);
> +
> +	return true;
> +}
> +
>  /**
>   * dma_fence_chain_walk - chain walking function
>   * @fence: current chain node
> @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>   */
>  struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>  {
> -	struct dma_fence_chain *chain, *prev_chain;
> -	struct dma_fence *prev, *replacement, *tmp;
> +	struct dma_fence_chain *chain;
> +	struct dma_fence *prev;
>  
>  	chain = to_dma_fence_chain(fence);
>  	if (!chain) {
> @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>  	}
>  
>  	while ((prev = dma_fence_chain_get_prev(chain))) {
> -
> -		prev_chain = to_dma_fence_chain(prev);
> -		if (prev_chain) {
> -			if (!dma_fence_is_signaled(prev_chain->fence))
> -				break;
> -
> -			replacement = dma_fence_chain_get_prev(prev_chain);
> -		} else {
> -			if (!dma_fence_is_signaled(prev))
> -				break;
> -
> -			replacement = NULL;
> -		}
> -
> -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
> -			      prev, replacement);
> -		if (tmp == prev)
> -			dma_fence_put(tmp);
> -		else
> -			dma_fence_put(replacement);
> +		if (!dma_fence_chain_try_replace(chain, prev))
> +			break;
>  		dma_fence_put(prev);
>  	}
>  
> @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
>  	dma_fence_put(prev);
>  
>  	dma_fence_put(chain->fence);
> -	dma_fence_free(fence);
> +	WRITE_ONCE(chain->fence, NULL);
> +
> +	/*
> +	 * Don't garbage collect here to avoid recursion and potential stack
> +	 * overflow.
> +	 */
> +	chain = dma_fence_chain_dequeue();
> +	if (!chain)
> +		return;
> +
> +	if (kref_read(&chain->base.refcount) ||
> +	    READ_ONCE(chain->fence))
> +		dma_fence_chain_enqueue(chain);
> +	else
> +		dma_fence_free(&chain->base);
>  }
>  
>  const struct dma_fence_ops dma_fence_chain_ops = {
> @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
>  };
>  EXPORT_SYMBOL(dma_fence_chain_ops);
>  
> +/**
> + * dma_fence_chain_garbage_collect - cleanup chain nodes
> + *
> + * Do some garbage collection and try to release chain nodes.
> + */
> +void dma_fence_chain_garbage_collect(void)
> +{
> +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
> +
> +	if (!chain)
> +		return;
> +
> +	if (!dma_fence_get_rcu(&chain->base)) {
> +		/* Unused, check if it's also clean */
> +		if (likely(!READ_ONCE(chain->fence))) {
> +			dma_fence_free(&chain->base);
> +			return;
> +		}
> +
> +	} else {
> +		struct dma_fence *prev;
> +
> +		/* Used, do some chain walk */
> +		prev = dma_fence_chain_get_prev(chain);
> +		if (prev) {
> +			dma_fence_chain_try_replace(chain, prev);
> +			dma_fence_put(prev);
> +		}
> +		dma_fence_put(&chain->base);
> +	}
> +	dma_fence_chain_enqueue(chain);
> +}
> +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
> +
>  /**
>   * dma_fence_chain_init - initialize a fence chain
>   * @chain: the chain node to initialize
> @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  
>  	dma_fence_init(&chain->base, &dma_fence_chain_ops,
>  		       &chain->lock, context, seqno);
> +	dma_fence_chain_enqueue(chain);
>  }
> +
>  EXPORT_SYMBOL(dma_fence_chain_init);
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 5f45689a6d2e..b412b5396434 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -19,6 +19,7 @@
>   * @base: fence base class
>   * @lock: spinlock for fence handling
>   * @prev: previous fence of the chain
> + * @next: next chain node for garbage collection
>   * @prev_seqno: original previous seqno before garbage collection
>   * @fence: encapsulated fence
>   * @cb: callback structure for signaling
> @@ -27,6 +28,7 @@
>  struct dma_fence_chain {
>  	struct dma_fence base;
>  	struct dma_fence __rcu *prev;
> +	struct dma_fence_chain __rcu *next;
>  	u64 prev_seqno;
>  	struct dma_fence *fence;
>  	union {
> @@ -38,6 +40,8 @@ struct dma_fence_chain {
>  
>  extern const struct dma_fence_ops dma_fence_chain_ops;
>  
> +void dma_fence_chain_garbage_collect(void);
> +
>  /**
>   * to_dma_fence_chain - cast a fence to a dma_fence_chain
>   * @fence: fence to cast to a dma_fence_array
> @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
>   */
>  static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>  {
> +	dma_fence_chain_garbage_collect();
>  	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>  };
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-11  8:58     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  8:58 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
> Add some rather sophisticated lockless garbage collection
> for dma_fence_chain objects.
> 
> For this keep all initialized dma_fence_chain nodes an a
> queue and trigger garbage collection before a new one is
> allocated.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Uh hand-rolled lockless list, I'm not a fan.

But the real question here is why? This is a global list, so it's going to
look great on your desktop, but gpus are for servers now and those are
NUMA. So just from that pov doing garbage-collection individually still
feels like a much better idea.

So what's the problem your trying to solve here?
-Daniel

> ---
>  drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
>  include/linux/dma-fence-chain.h   |   5 +
>  2 files changed, 142 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index 1b4cb3e5cec9..c2f0b69eabb7 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -9,8 +9,53 @@
>  
>  #include <linux/dma-fence-chain.h>
>  
> +static struct dma_fence_chain __rcu *fifo_front;
> +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
> +
>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>  
> +/**
> + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
> + * @chain: the chain node to enqueue
> + *
> + * Add the chain node to the end of the gc fifo.
> + */
> +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
> +{
> +	struct dma_fence_chain __rcu **tmp;
> +
> +	RCU_INIT_POINTER(chain->next, NULL);
> +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
> +		   &chain->next);
> +
> +	/* This is intentionally unordered since we only need the fifo for gc */
> +	rcu_assign_pointer(*tmp, chain);
> +}
> +
> +/**
> + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
> + *
> + * Remove the first chain node from the gc fifo while making sure that always
> + * keep at least one node on the fifo for lockless fifo implementation.
> + * Returns the dequeued chain node or NULL.
> + */
> +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
> +{
> +	struct dma_fence_chain *chain, *tmp;
> +
> +	rcu_read_lock();
> +	chain = rcu_dereference(fifo_front);
> +	/* Never dequeue the last chain node for lockless fifo */
> +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
> +		rcu_read_unlock();
> +		return NULL;
> +	}
> +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
> +		      chain, rcu_access_pointer(chain->next));
> +	rcu_read_unlock();
> +	return tmp == chain ? chain : NULL;
> +}
> +
>  /**
>   * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
>   * @chain: chain node to get the previous node from
> @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>  	return prev;
>  }
>  
> +/**
> + * dma_fence_chain_try_replace - try to replace the prev node
> + * @chain: Chain node we try to replace prev.
> + * @prev: the old prev node
> + *
> + * Try to replace the previous chain node when it or its containing fence is
> + * signaled. Returns true if we tried, false if we need to wait.
> + */
> +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
> +					struct dma_fence *prev)
> +{
> +	struct dma_fence *replacement, *tmp;
> +	struct dma_fence_chain *prev_chain;
> +
> +	prev_chain = to_dma_fence_chain(prev);
> +	if (prev_chain) {
> +		if (!dma_fence_is_signaled(prev_chain->fence))
> +			return false;
> +
> +		replacement = dma_fence_chain_get_prev(prev_chain);
> +	} else {
> +		if (!dma_fence_is_signaled(prev))
> +			return false;
> +
> +		replacement = NULL;
> +	}
> +
> +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
> +		      replacement);
> +	if (tmp == prev)
> +		dma_fence_put(tmp);
> +	else
> +		dma_fence_put(replacement);
> +
> +	return true;
> +}
> +
>  /**
>   * dma_fence_chain_walk - chain walking function
>   * @fence: current chain node
> @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>   */
>  struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>  {
> -	struct dma_fence_chain *chain, *prev_chain;
> -	struct dma_fence *prev, *replacement, *tmp;
> +	struct dma_fence_chain *chain;
> +	struct dma_fence *prev;
>  
>  	chain = to_dma_fence_chain(fence);
>  	if (!chain) {
> @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>  	}
>  
>  	while ((prev = dma_fence_chain_get_prev(chain))) {
> -
> -		prev_chain = to_dma_fence_chain(prev);
> -		if (prev_chain) {
> -			if (!dma_fence_is_signaled(prev_chain->fence))
> -				break;
> -
> -			replacement = dma_fence_chain_get_prev(prev_chain);
> -		} else {
> -			if (!dma_fence_is_signaled(prev))
> -				break;
> -
> -			replacement = NULL;
> -		}
> -
> -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
> -			      prev, replacement);
> -		if (tmp == prev)
> -			dma_fence_put(tmp);
> -		else
> -			dma_fence_put(replacement);
> +		if (!dma_fence_chain_try_replace(chain, prev))
> +			break;
>  		dma_fence_put(prev);
>  	}
>  
> @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
>  	dma_fence_put(prev);
>  
>  	dma_fence_put(chain->fence);
> -	dma_fence_free(fence);
> +	WRITE_ONCE(chain->fence, NULL);
> +
> +	/*
> +	 * Don't garbage collect here to avoid recursion and potential stack
> +	 * overflow.
> +	 */
> +	chain = dma_fence_chain_dequeue();
> +	if (!chain)
> +		return;
> +
> +	if (kref_read(&chain->base.refcount) ||
> +	    READ_ONCE(chain->fence))
> +		dma_fence_chain_enqueue(chain);
> +	else
> +		dma_fence_free(&chain->base);
>  }
>  
>  const struct dma_fence_ops dma_fence_chain_ops = {
> @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
>  };
>  EXPORT_SYMBOL(dma_fence_chain_ops);
>  
> +/**
> + * dma_fence_chain_garbage_collect - cleanup chain nodes
> + *
> + * Do some garbage collection and try to release chain nodes.
> + */
> +void dma_fence_chain_garbage_collect(void)
> +{
> +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
> +
> +	if (!chain)
> +		return;
> +
> +	if (!dma_fence_get_rcu(&chain->base)) {
> +		/* Unused, check if it's also clean */
> +		if (likely(!READ_ONCE(chain->fence))) {
> +			dma_fence_free(&chain->base);
> +			return;
> +		}
> +
> +	} else {
> +		struct dma_fence *prev;
> +
> +		/* Used, do some chain walk */
> +		prev = dma_fence_chain_get_prev(chain);
> +		if (prev) {
> +			dma_fence_chain_try_replace(chain, prev);
> +			dma_fence_put(prev);
> +		}
> +		dma_fence_put(&chain->base);
> +	}
> +	dma_fence_chain_enqueue(chain);
> +}
> +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
> +
>  /**
>   * dma_fence_chain_init - initialize a fence chain
>   * @chain: the chain node to initialize
> @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>  
>  	dma_fence_init(&chain->base, &dma_fence_chain_ops,
>  		       &chain->lock, context, seqno);
> +	dma_fence_chain_enqueue(chain);
>  }
> +
>  EXPORT_SYMBOL(dma_fence_chain_init);
> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> index 5f45689a6d2e..b412b5396434 100644
> --- a/include/linux/dma-fence-chain.h
> +++ b/include/linux/dma-fence-chain.h
> @@ -19,6 +19,7 @@
>   * @base: fence base class
>   * @lock: spinlock for fence handling
>   * @prev: previous fence of the chain
> + * @next: next chain node for garbage collection
>   * @prev_seqno: original previous seqno before garbage collection
>   * @fence: encapsulated fence
>   * @cb: callback structure for signaling
> @@ -27,6 +28,7 @@
>  struct dma_fence_chain {
>  	struct dma_fence base;
>  	struct dma_fence __rcu *prev;
> +	struct dma_fence_chain __rcu *next;
>  	u64 prev_seqno;
>  	struct dma_fence *fence;
>  	union {
> @@ -38,6 +40,8 @@ struct dma_fence_chain {
>  
>  extern const struct dma_fence_ops dma_fence_chain_ops;
>  
> +void dma_fence_chain_garbage_collect(void);
> +
>  /**
>   * to_dma_fence_chain - cast a fence to a dma_fence_chain
>   * @fence: fence to cast to a dma_fence_array
> @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
>   */
>  static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>  {
> +	dma_fence_chain_garbage_collect();
>  	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>  };
>  
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-10  9:17   ` Christian König
@ 2021-06-11  9:07     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  9:07 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> Unwrap a the explicit fence if it is a dma_fence_chain and
> sync to the first fence not matching the owner rules.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>  1 file changed, 68 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index 1b2ceccaf5b0..862eb3c1c4c5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -28,6 +28,8 @@
>   *    Christian König <christian.koenig@amd.com>
>   */
>  
> +#include <linux/dma-fence-chain.h>
> +
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  #include "amdgpu_amdkfd.h"
> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>  	return amdgpu_sync_fence(sync, fence);
>  }
>  
> +/* Determine based on the owner and mode if we should sync to a fence or not */
> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> +				   enum amdgpu_sync_mode mode,
> +				   void *owner, struct dma_fence *f)
> +{
> +	void *fence_owner = amdgpu_sync_get_owner(f);
> +
> +	/* Always sync to moves, no matter what */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return true;
> +
> +	/* We only want to trigger KFD eviction fences on
> +	 * evict or move jobs. Skip KFD fences otherwise.
> +	 */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return false;
> +
> +	/* Never sync to VM updates either. */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return false;
> +
> +	/* Ignore fences depending on the sync mode */
> +	switch (mode) {
> +	case AMDGPU_SYNC_ALWAYS:
> +		return true;
> +
> +	case AMDGPU_SYNC_NE_OWNER:
> +		if (amdgpu_sync_same_dev(adev, f) &&
> +		    fence_owner == owner)
> +			return false;
> +		break;
> +
> +	case AMDGPU_SYNC_EQ_OWNER:
> +		if (amdgpu_sync_same_dev(adev, f) &&
> +		    fence_owner != owner)
> +			return false;
> +		break;
> +
> +	case AMDGPU_SYNC_EXPLICIT:
> +		return false;
> +	}
> +
> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> +	     "Adding eviction fence to sync obj");
> +	return true;
> +}
> +
>  /**
>   * amdgpu_sync_resv - sync to a reservation object
>   *
> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>  
>  	/* always sync to the exclusive fence */
>  	f = dma_resv_excl_fence(resv);
> -	r = amdgpu_sync_fence(sync, f);
> +	dma_fence_chain_for_each(f, f) {

Jason has some helper for deep-walking fence chains/arrays here I think.
Might want to look into that, so that we have some consistency in how we
pile up multiple exclusive fences.

Anyway pretty much one of the versions I had in mind too, except I didn't
type it up.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> +
> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> +					   chain->fence : f)) {
> +			r = amdgpu_sync_fence(sync, f);
> +			dma_fence_put(f);
> +			if (r)
> +				return r;
> +			break;
> +		}
> +	}
>  
>  	flist = dma_resv_shared_list(resv);
> -	if (!flist || r)
> -		return r;
> +	if (!flist)
> +		return 0;
>  
>  	for (i = 0; i < flist->shared_count; ++i) {
> -		void *fence_owner;
> -
>  		f = rcu_dereference_protected(flist->shared[i],
>  					      dma_resv_held(resv));
>  
> -		fence_owner = amdgpu_sync_get_owner(f);
> -
> -		/* Always sync to moves, no matter what */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>  			r = amdgpu_sync_fence(sync, f);
>  			if (r)
> -				break;
> -		}
> -
> -		/* We only want to trigger KFD eviction fences on
> -		 * evict or move jobs. Skip KFD fences otherwise.
> -		 */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> -			continue;
> -
> -		/* Never sync to VM updates either. */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> -			continue;
> -
> -		/* Ignore fences depending on the sync mode */
> -		switch (mode) {
> -		case AMDGPU_SYNC_ALWAYS:
> -			break;
> -
> -		case AMDGPU_SYNC_NE_OWNER:
> -			if (amdgpu_sync_same_dev(adev, f) &&
> -			    fence_owner == owner)
> -				continue;
> -			break;
> -
> -		case AMDGPU_SYNC_EQ_OWNER:
> -			if (amdgpu_sync_same_dev(adev, f) &&
> -			    fence_owner != owner)
> -				continue;
> -			break;
> -
> -		case AMDGPU_SYNC_EXPLICIT:
> -			continue;
> +				return r;
>  		}
> -
> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> -		     "Adding eviction fence to sync obj");
> -		r = amdgpu_sync_fence(sync, f);
> -		if (r)
> -			break;
>  	}
> -	return r;
> +	return 0;
>  }
>  
>  /**
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-11  9:07     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  9:07 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> Unwrap a the explicit fence if it is a dma_fence_chain and
> sync to the first fence not matching the owner rules.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>  1 file changed, 68 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index 1b2ceccaf5b0..862eb3c1c4c5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -28,6 +28,8 @@
>   *    Christian König <christian.koenig@amd.com>
>   */
>  
> +#include <linux/dma-fence-chain.h>
> +
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  #include "amdgpu_amdkfd.h"
> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>  	return amdgpu_sync_fence(sync, fence);
>  }
>  
> +/* Determine based on the owner and mode if we should sync to a fence or not */
> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> +				   enum amdgpu_sync_mode mode,
> +				   void *owner, struct dma_fence *f)
> +{
> +	void *fence_owner = amdgpu_sync_get_owner(f);
> +
> +	/* Always sync to moves, no matter what */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return true;
> +
> +	/* We only want to trigger KFD eviction fences on
> +	 * evict or move jobs. Skip KFD fences otherwise.
> +	 */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return false;
> +
> +	/* Never sync to VM updates either. */
> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +		return false;
> +
> +	/* Ignore fences depending on the sync mode */
> +	switch (mode) {
> +	case AMDGPU_SYNC_ALWAYS:
> +		return true;
> +
> +	case AMDGPU_SYNC_NE_OWNER:
> +		if (amdgpu_sync_same_dev(adev, f) &&
> +		    fence_owner == owner)
> +			return false;
> +		break;
> +
> +	case AMDGPU_SYNC_EQ_OWNER:
> +		if (amdgpu_sync_same_dev(adev, f) &&
> +		    fence_owner != owner)
> +			return false;
> +		break;
> +
> +	case AMDGPU_SYNC_EXPLICIT:
> +		return false;
> +	}
> +
> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> +	     "Adding eviction fence to sync obj");
> +	return true;
> +}
> +
>  /**
>   * amdgpu_sync_resv - sync to a reservation object
>   *
> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>  
>  	/* always sync to the exclusive fence */
>  	f = dma_resv_excl_fence(resv);
> -	r = amdgpu_sync_fence(sync, f);
> +	dma_fence_chain_for_each(f, f) {

Jason has some helper for deep-walking fence chains/arrays here I think.
Might want to look into that, so that we have some consistency in how we
pile up multiple exclusive fences.

Anyway pretty much one of the versions I had in mind too, except I didn't
type it up.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> +
> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> +					   chain->fence : f)) {
> +			r = amdgpu_sync_fence(sync, f);
> +			dma_fence_put(f);
> +			if (r)
> +				return r;
> +			break;
> +		}
> +	}
>  
>  	flist = dma_resv_shared_list(resv);
> -	if (!flist || r)
> -		return r;
> +	if (!flist)
> +		return 0;
>  
>  	for (i = 0; i < flist->shared_count; ++i) {
> -		void *fence_owner;
> -
>  		f = rcu_dereference_protected(flist->shared[i],
>  					      dma_resv_held(resv));
>  
> -		fence_owner = amdgpu_sync_get_owner(f);
> -
> -		/* Always sync to moves, no matter what */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>  			r = amdgpu_sync_fence(sync, f);
>  			if (r)
> -				break;
> -		}
> -
> -		/* We only want to trigger KFD eviction fences on
> -		 * evict or move jobs. Skip KFD fences otherwise.
> -		 */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> -			continue;
> -
> -		/* Never sync to VM updates either. */
> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> -			continue;
> -
> -		/* Ignore fences depending on the sync mode */
> -		switch (mode) {
> -		case AMDGPU_SYNC_ALWAYS:
> -			break;
> -
> -		case AMDGPU_SYNC_NE_OWNER:
> -			if (amdgpu_sync_same_dev(adev, f) &&
> -			    fence_owner == owner)
> -				continue;
> -			break;
> -
> -		case AMDGPU_SYNC_EQ_OWNER:
> -			if (amdgpu_sync_same_dev(adev, f) &&
> -			    fence_owner != owner)
> -				continue;
> -			break;
> -
> -		case AMDGPU_SYNC_EXPLICIT:
> -			continue;
> +				return r;
>  		}
> -
> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> -		     "Adding eviction fence to sync obj");
> -		r = amdgpu_sync_fence(sync, f);
> -		if (r)
> -			break;
>  	}
> -	return r;
> +	return 0;
>  }
>  
>  /**
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
  2021-06-10  9:18   ` Christian König
@ 2021-06-11  9:17     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  9:17 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
> Drop the workaround and instead implement a better solution.
> 
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
> 
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 47 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
>  	struct ttm_validate_buffer	tv;
>  	struct amdgpu_bo_va		*bo_va;
> +	struct dma_fence_chain		*chain;
>  	uint32_t			priority;
>  	struct page			**user_pages;
>  	bool				user_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 325e82621467..f6f3029f958d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  		goto out;
>  	}
>  
> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> +			e->chain = dma_fence_chain_alloc();
> +			if (!e->chain) {
> +				r = -ENOMEM;
> +				goto error_validate;
> +			}
> +		}
> +	}
> +
>  	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
>  					  &p->bytes_moved_vis_threshold);
>  	p->bytes_moved = 0;
> @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  	gws = p->bo_list->gws_obj;
>  	oa = p->bo_list->oa_obj;
>  
> -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> -		/* Make sure we use the exclusive slot for shared BOs */
> -		if (bo->prime_shared_count)
> -			e->tv.num_shared = 0;
> -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> -	}
> -
>  	if (gds) {
>  		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>  		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  	}
>  
>  error_validate:
> -	if (r)
> +	if (r) {
> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +			dma_fence_chain_free(e->chain);
> +			e->chain = NULL;
> +		}
>  		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
> +	}
>  out:
>  	return r;
>  }
> @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
>  {
>  	unsigned i;
>  
> -	if (error && backoff)
> +	if (error && backoff) {
> +		struct amdgpu_bo_list_entry *e;
> +
> +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> +			dma_fence_chain_free(e->chain);
> +			e->chain = NULL;
> +		}
> +
>  		ttm_eu_backoff_reservation(&parser->ticket,
>  					   &parser->validated);
> +	}
>  
>  	for (i = 0; i < parser->num_post_deps; i++) {
>  		drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  
>  	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
>  
> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +		struct dma_resv *resv = e->tv.bo->base.resv;
> +		struct dma_fence_chain *chain = e->chain;
> +
> +		if (!chain)
> +			continue;
> +
> +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> +				     dma_fence_get(p->fence), 1);
> +
> +		rcu_assign_pointer(resv->fence_excl, &chain->base);

So for safety since this is now driver interface I was thinking off a
helper which does this entire dance and _also_ adds the new fence to the
shared slots. This way we don't let drivers open-code this rather tricky
operation.

You probably also want to then combine this with
ttm_eu_fence_buffer_objects below I think so you're not walking this list
2x.

Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
then let ttm_eu_fence_buffer_objects() do this, by calling a new
dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
sync_file import will use. We want to hide the inner workings of dma_resv
as much as possible ofc.

The other thing I'm wondering is whether this needs to be wrapped in a
single seqlock or whether we don't have ordering issues if we split up the
update here? Setting the exclusive fence before we also added it to the
shared slot can at least violate the "shared fences signal after exclusive
if both are present"

Finally I guess need to sprinkle the manual garbage collector somehwere
here too.

But aside from the interface polish and correctness against races I think
this looks solid in the big picture.

Cheers, Daniel

> +		e->chain = NULL;
> +	}
> +
>  	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>  	mutex_unlock(&p->adev->notifier_lock);
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index c3053b83b80c..23219fc3b28c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -42,48 +42,6 @@
>  #include <linux/pci-p2pdma.h>
>  #include <linux/pm_runtime.h>
>  
> -static int
> -__dma_resv_make_exclusive(struct dma_resv *obj)
> -{
> -	struct dma_fence **fences;
> -	unsigned int count;
> -	int r;
> -
> -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
> -		return 0;
> -
> -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
> -	if (r)
> -		return r;
> -
> -	if (count == 0) {
> -		/* Now that was unexpected. */
> -	} else if (count == 1) {
> -		dma_resv_add_excl_fence(obj, fences[0]);
> -		dma_fence_put(fences[0]);
> -		kfree(fences);
> -	} else {
> -		struct dma_fence_array *array;
> -
> -		array = dma_fence_array_create(count, fences,
> -					       dma_fence_context_alloc(1), 0,
> -					       false);
> -		if (!array)
> -			goto err_fences_put;
> -
> -		dma_resv_add_excl_fence(obj, &array->base);
> -		dma_fence_put(&array->base);
> -	}
> -
> -	return 0;
> -
> -err_fences_put:
> -	while (count--)
> -		dma_fence_put(fences[count]);
> -	kfree(fences);
> -	return -ENOMEM;
> -}
> -
>  /**
>   * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
>   *
> @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>  	if (r < 0)
>  		goto out;
>  
> -	r = amdgpu_bo_reserve(bo, false);
> -	if (unlikely(r != 0))
> -		goto out;
> -
> -	/*
> -	 * We only create shared fences for internal use, but importers
> -	 * of the dmabuf rely on exclusive fences for implicitly
> -	 * tracking write hazards. As any of the current fences may
> -	 * correspond to a write, we need to convert all existing
> -	 * fences on the reservation object into a single exclusive
> -	 * fence.
> -	 */
> -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
> -	if (r)
> -		goto out;
> -
> -	bo->prime_shared_count++;
> -	amdgpu_bo_unreserve(bo);
>  	return 0;
>  
>  out:
> @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
>  	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  
> -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
> -		bo->prime_shared_count--;
> -
>  	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>  	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
>  }
> @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
>  	bo = gem_to_amdgpu_bo(gobj);
>  	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
>  	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
> -		bo->prime_shared_count = 1;
>  
>  	dma_resv_unlock(resv);
>  	return gobj;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 1c3e3b608332..5d82120fc160 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>  		break;
>  	}
>  	case AMDGPU_GEM_OP_SET_PLACEMENT:
> -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
> +		if (robj->tbo.base.import_attach &&
> +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
>  			r = -EINVAL;
>  			amdgpu_bo_unreserve(robj);
>  			break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 96447e1d4c9c..30951b593809 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
>  		return -EINVAL;
>  
>  	/* A shared bo cannot be migrated to VRAM */
> -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
> +	if (bo->tbo.base.import_attach) {
>  		if (domain & AMDGPU_GEM_DOMAIN_GTT)
>  			domain = AMDGPU_GEM_DOMAIN_GTT;
>  		else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index b35962702278..55d7e90eaa72 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -98,7 +98,6 @@ struct amdgpu_bo {
>  	struct ttm_buffer_object	tbo;
>  	struct ttm_bo_kmap_obj		kmap;
>  	u64				flags;
> -	unsigned			prime_shared_count;
>  	/* per VM structure for page tables and with virtual addresses */
>  	struct amdgpu_vm_bo_base	*vm_bo;
>  	/* Constant after initialization */
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
@ 2021-06-11  9:17     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11  9:17 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, daniel

On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
> Drop the workaround and instead implement a better solution.
> 
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
> 
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 47 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
>  	struct ttm_validate_buffer	tv;
>  	struct amdgpu_bo_va		*bo_va;
> +	struct dma_fence_chain		*chain;
>  	uint32_t			priority;
>  	struct page			**user_pages;
>  	bool				user_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 325e82621467..f6f3029f958d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  		goto out;
>  	}
>  
> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> +			e->chain = dma_fence_chain_alloc();
> +			if (!e->chain) {
> +				r = -ENOMEM;
> +				goto error_validate;
> +			}
> +		}
> +	}
> +
>  	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
>  					  &p->bytes_moved_vis_threshold);
>  	p->bytes_moved = 0;
> @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  	gws = p->bo_list->gws_obj;
>  	oa = p->bo_list->oa_obj;
>  
> -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> -		/* Make sure we use the exclusive slot for shared BOs */
> -		if (bo->prime_shared_count)
> -			e->tv.num_shared = 0;
> -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> -	}
> -
>  	if (gds) {
>  		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>  		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>  	}
>  
>  error_validate:
> -	if (r)
> +	if (r) {
> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +			dma_fence_chain_free(e->chain);
> +			e->chain = NULL;
> +		}
>  		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
> +	}
>  out:
>  	return r;
>  }
> @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
>  {
>  	unsigned i;
>  
> -	if (error && backoff)
> +	if (error && backoff) {
> +		struct amdgpu_bo_list_entry *e;
> +
> +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> +			dma_fence_chain_free(e->chain);
> +			e->chain = NULL;
> +		}
> +
>  		ttm_eu_backoff_reservation(&parser->ticket,
>  					   &parser->validated);
> +	}
>  
>  	for (i = 0; i < parser->num_post_deps; i++) {
>  		drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  
>  	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
>  
> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +		struct dma_resv *resv = e->tv.bo->base.resv;
> +		struct dma_fence_chain *chain = e->chain;
> +
> +		if (!chain)
> +			continue;
> +
> +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> +				     dma_fence_get(p->fence), 1);
> +
> +		rcu_assign_pointer(resv->fence_excl, &chain->base);

So for safety since this is now driver interface I was thinking off a
helper which does this entire dance and _also_ adds the new fence to the
shared slots. This way we don't let drivers open-code this rather tricky
operation.

You probably also want to then combine this with
ttm_eu_fence_buffer_objects below I think so you're not walking this list
2x.

Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
then let ttm_eu_fence_buffer_objects() do this, by calling a new
dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
sync_file import will use. We want to hide the inner workings of dma_resv
as much as possible ofc.

The other thing I'm wondering is whether this needs to be wrapped in a
single seqlock or whether we don't have ordering issues if we split up the
update here? Setting the exclusive fence before we also added it to the
shared slot can at least violate the "shared fences signal after exclusive
if both are present"

Finally I guess need to sprinkle the manual garbage collector somehwere
here too.

But aside from the interface polish and correctness against races I think
this looks solid in the big picture.

Cheers, Daniel

> +		e->chain = NULL;
> +	}
> +
>  	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>  	mutex_unlock(&p->adev->notifier_lock);
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index c3053b83b80c..23219fc3b28c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -42,48 +42,6 @@
>  #include <linux/pci-p2pdma.h>
>  #include <linux/pm_runtime.h>
>  
> -static int
> -__dma_resv_make_exclusive(struct dma_resv *obj)
> -{
> -	struct dma_fence **fences;
> -	unsigned int count;
> -	int r;
> -
> -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
> -		return 0;
> -
> -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
> -	if (r)
> -		return r;
> -
> -	if (count == 0) {
> -		/* Now that was unexpected. */
> -	} else if (count == 1) {
> -		dma_resv_add_excl_fence(obj, fences[0]);
> -		dma_fence_put(fences[0]);
> -		kfree(fences);
> -	} else {
> -		struct dma_fence_array *array;
> -
> -		array = dma_fence_array_create(count, fences,
> -					       dma_fence_context_alloc(1), 0,
> -					       false);
> -		if (!array)
> -			goto err_fences_put;
> -
> -		dma_resv_add_excl_fence(obj, &array->base);
> -		dma_fence_put(&array->base);
> -	}
> -
> -	return 0;
> -
> -err_fences_put:
> -	while (count--)
> -		dma_fence_put(fences[count]);
> -	kfree(fences);
> -	return -ENOMEM;
> -}
> -
>  /**
>   * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
>   *
> @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>  	if (r < 0)
>  		goto out;
>  
> -	r = amdgpu_bo_reserve(bo, false);
> -	if (unlikely(r != 0))
> -		goto out;
> -
> -	/*
> -	 * We only create shared fences for internal use, but importers
> -	 * of the dmabuf rely on exclusive fences for implicitly
> -	 * tracking write hazards. As any of the current fences may
> -	 * correspond to a write, we need to convert all existing
> -	 * fences on the reservation object into a single exclusive
> -	 * fence.
> -	 */
> -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
> -	if (r)
> -		goto out;
> -
> -	bo->prime_shared_count++;
> -	amdgpu_bo_unreserve(bo);
>  	return 0;
>  
>  out:
> @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
>  	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  
> -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
> -		bo->prime_shared_count--;
> -
>  	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>  	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
>  }
> @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
>  	bo = gem_to_amdgpu_bo(gobj);
>  	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
>  	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
> -		bo->prime_shared_count = 1;
>  
>  	dma_resv_unlock(resv);
>  	return gobj;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 1c3e3b608332..5d82120fc160 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>  		break;
>  	}
>  	case AMDGPU_GEM_OP_SET_PLACEMENT:
> -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
> +		if (robj->tbo.base.import_attach &&
> +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
>  			r = -EINVAL;
>  			amdgpu_bo_unreserve(robj);
>  			break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 96447e1d4c9c..30951b593809 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
>  		return -EINVAL;
>  
>  	/* A shared bo cannot be migrated to VRAM */
> -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
> +	if (bo->tbo.base.import_attach) {
>  		if (domain & AMDGPU_GEM_DOMAIN_GTT)
>  			domain = AMDGPU_GEM_DOMAIN_GTT;
>  		else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index b35962702278..55d7e90eaa72 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -98,7 +98,6 @@ struct amdgpu_bo {
>  	struct ttm_buffer_object	tbo;
>  	struct ttm_bo_kmap_obj		kmap;
>  	u64				flags;
> -	unsigned			prime_shared_count;
>  	/* per VM structure for page tables and with virtual addresses */
>  	struct amdgpu_vm_bo_base	*vm_bo;
>  	/* Constant after initialization */
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
  2021-06-11  7:58     ` Daniel Vetter
@ 2021-06-11 10:04       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:04 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 09:58 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:56AM +0200, Christian König wrote:
>> Exercise the newly added functions.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> I have honestly no idea what this checks. Spawning a few threads to
> validate kmalloc/kfree feels a bit silly. Now testing whether we correctly
> rcu-delay the freeing here would make some sense, but even that feels a
> bit silly.
>
> I guess if you want this explain with comments what it does and why?

This was soley to figure out if the garbage collection is working 
properly and how much overhead it generates.

No actual need to commit it.

Christian.

> -Daniel
>
>
>> ---
>>   drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
>> index 8ce1ea59d31b..855c129c6093 100644
>> --- a/drivers/dma-buf/st-dma-fence-chain.c
>> +++ b/drivers/dma-buf/st-dma-fence-chain.c
>> @@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
>>   	return err;
>>   }
>>   
>> +static int __alloc_free(void *arg)
>> +{
>> +	atomic_t *counter = arg;
>> +	int i, j;
>> +
>> +	for (i = 0; i < 1024; ++i) {
>> +		struct dma_fence_chain *chains[64];
>> +
>> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
>> +			chains[j] = dma_fence_chain_alloc();
>> +
>> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
>> +			dma_fence_chain_free(chains[j]);
>> +
>> +		atomic_add(ARRAY_SIZE(chains), counter);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int alloc_free(void *arg)
>> +{
>> +	struct task_struct *threads[8];
>> +	atomic_t counter = ATOMIC_INIT(0);
>> +	int i, err = 0;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(threads); i++) {
>> +		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
>> +					 i);
>> +		if (IS_ERR(threads[i])) {
>> +			err = PTR_ERR(threads[i]);
>> +			break;
>> +		}
>> +	}
>> +
>> +	while (i--) {
>> +		int ret;
>> +
>> +		ret = kthread_stop(threads[i]);
>> +		if (ret && !err)
>> +			err = ret;
>> +	}
>> +
>> +	pr_info("Completed %u cycles\n", atomic_read(&counter));
>> +
>> +	return err;
>> +}
>> +
>>   struct fence_chains {
>>   	unsigned int chain_length;
>>   	struct dma_fence **fences;
>> @@ -677,6 +724,7 @@ int dma_fence_chain(void)
>>   {
>>   	static const struct subtest tests[] = {
>>   		SUBTEST(sanitycheck),
>> +		SUBTEST(alloc_free),
>>   		SUBTEST(find_seqno),
>>   		SUBTEST(find_signaled),
>>   		SUBTEST(find_out_of_order),
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests
@ 2021-06-11 10:04       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:04 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 09:58 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:56AM +0200, Christian König wrote:
>> Exercise the newly added functions.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> I have honestly no idea what this checks. Spawning a few threads to
> validate kmalloc/kfree feels a bit silly. Now testing whether we correctly
> rcu-delay the freeing here would make some sense, but even that feels a
> bit silly.
>
> I guess if you want this explain with comments what it does and why?

This was soley to figure out if the garbage collection is working 
properly and how much overhead it generates.

No actual need to commit it.

Christian.

> -Daniel
>
>
>> ---
>>   drivers/dma-buf/st-dma-fence-chain.c | 48 ++++++++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
>> index 8ce1ea59d31b..855c129c6093 100644
>> --- a/drivers/dma-buf/st-dma-fence-chain.c
>> +++ b/drivers/dma-buf/st-dma-fence-chain.c
>> @@ -95,6 +95,53 @@ static int sanitycheck(void *arg)
>>   	return err;
>>   }
>>   
>> +static int __alloc_free(void *arg)
>> +{
>> +	atomic_t *counter = arg;
>> +	int i, j;
>> +
>> +	for (i = 0; i < 1024; ++i) {
>> +		struct dma_fence_chain *chains[64];
>> +
>> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
>> +			chains[j] = dma_fence_chain_alloc();
>> +
>> +		for (j = 0; j < ARRAY_SIZE(chains); ++j)
>> +			dma_fence_chain_free(chains[j]);
>> +
>> +		atomic_add(ARRAY_SIZE(chains), counter);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int alloc_free(void *arg)
>> +{
>> +	struct task_struct *threads[8];
>> +	atomic_t counter = ATOMIC_INIT(0);
>> +	int i, err = 0;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(threads); i++) {
>> +		threads[i] = kthread_run(__alloc_free, &counter, "dmabuf/%d",
>> +					 i);
>> +		if (IS_ERR(threads[i])) {
>> +			err = PTR_ERR(threads[i]);
>> +			break;
>> +		}
>> +	}
>> +
>> +	while (i--) {
>> +		int ret;
>> +
>> +		ret = kthread_stop(threads[i]);
>> +		if (ret && !err)
>> +			err = ret;
>> +	}
>> +
>> +	pr_info("Completed %u cycles\n", atomic_read(&counter));
>> +
>> +	return err;
>> +}
>> +
>>   struct fence_chains {
>>   	unsigned int chain_length;
>>   	struct dma_fence **fences;
>> @@ -677,6 +724,7 @@ int dma_fence_chain(void)
>>   {
>>   	static const struct subtest tests[] = {
>>   		SUBTEST(sanitycheck),
>> +		SUBTEST(alloc_free),
>>   		SUBTEST(find_seqno),
>>   		SUBTEST(find_signaled),
>>   		SUBTEST(find_out_of_order),
>> -- 
>> 2.25.1
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-11  8:58     ` Daniel Vetter
@ 2021-06-11 10:07       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 10:58 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
>> Add some rather sophisticated lockless garbage collection
>> for dma_fence_chain objects.
>>
>> For this keep all initialized dma_fence_chain nodes an a
>> queue and trigger garbage collection before a new one is
>> allocated.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> Uh hand-rolled lockless list, I'm not a fan.
>
> But the real question here is why? This is a global list, so it's going to
> look great on your desktop, but gpus are for servers now and those are
> NUMA. So just from that pov doing garbage-collection individually still
> feels like a much better idea.

Yeah, I was pondering on that quite a bit as well. That why I added the 
multi threaded alloc/free test.

> So what's the problem your trying to solve here?

I was not sure if the chain is garbage collected enough when used for 
the dma_resv exclusive object.

But I think we could just drop this for now and just see how it goes.

Christian.

> -Daniel
>
>> ---
>>   drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
>>   include/linux/dma-fence-chain.h   |   5 +
>>   2 files changed, 142 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
>> index 1b4cb3e5cec9..c2f0b69eabb7 100644
>> --- a/drivers/dma-buf/dma-fence-chain.c
>> +++ b/drivers/dma-buf/dma-fence-chain.c
>> @@ -9,8 +9,53 @@
>>   
>>   #include <linux/dma-fence-chain.h>
>>   
>> +static struct dma_fence_chain __rcu *fifo_front;
>> +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
>> +
>>   static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>>   
>> +/**
>> + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
>> + * @chain: the chain node to enqueue
>> + *
>> + * Add the chain node to the end of the gc fifo.
>> + */
>> +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
>> +{
>> +	struct dma_fence_chain __rcu **tmp;
>> +
>> +	RCU_INIT_POINTER(chain->next, NULL);
>> +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
>> +		   &chain->next);
>> +
>> +	/* This is intentionally unordered since we only need the fifo for gc */
>> +	rcu_assign_pointer(*tmp, chain);
>> +}
>> +
>> +/**
>> + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
>> + *
>> + * Remove the first chain node from the gc fifo while making sure that always
>> + * keep at least one node on the fifo for lockless fifo implementation.
>> + * Returns the dequeued chain node or NULL.
>> + */
>> +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
>> +{
>> +	struct dma_fence_chain *chain, *tmp;
>> +
>> +	rcu_read_lock();
>> +	chain = rcu_dereference(fifo_front);
>> +	/* Never dequeue the last chain node for lockless fifo */
>> +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
>> +		rcu_read_unlock();
>> +		return NULL;
>> +	}
>> +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
>> +		      chain, rcu_access_pointer(chain->next));
>> +	rcu_read_unlock();
>> +	return tmp == chain ? chain : NULL;
>> +}
>> +
>>   /**
>>    * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
>>    * @chain: chain node to get the previous node from
>> @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>>   	return prev;
>>   }
>>   
>> +/**
>> + * dma_fence_chain_try_replace - try to replace the prev node
>> + * @chain: Chain node we try to replace prev.
>> + * @prev: the old prev node
>> + *
>> + * Try to replace the previous chain node when it or its containing fence is
>> + * signaled. Returns true if we tried, false if we need to wait.
>> + */
>> +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
>> +					struct dma_fence *prev)
>> +{
>> +	struct dma_fence *replacement, *tmp;
>> +	struct dma_fence_chain *prev_chain;
>> +
>> +	prev_chain = to_dma_fence_chain(prev);
>> +	if (prev_chain) {
>> +		if (!dma_fence_is_signaled(prev_chain->fence))
>> +			return false;
>> +
>> +		replacement = dma_fence_chain_get_prev(prev_chain);
>> +	} else {
>> +		if (!dma_fence_is_signaled(prev))
>> +			return false;
>> +
>> +		replacement = NULL;
>> +	}
>> +
>> +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
>> +		      replacement);
>> +	if (tmp == prev)
>> +		dma_fence_put(tmp);
>> +	else
>> +		dma_fence_put(replacement);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * dma_fence_chain_walk - chain walking function
>>    * @fence: current chain node
>> @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>>    */
>>   struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>>   {
>> -	struct dma_fence_chain *chain, *prev_chain;
>> -	struct dma_fence *prev, *replacement, *tmp;
>> +	struct dma_fence_chain *chain;
>> +	struct dma_fence *prev;
>>   
>>   	chain = to_dma_fence_chain(fence);
>>   	if (!chain) {
>> @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>>   	}
>>   
>>   	while ((prev = dma_fence_chain_get_prev(chain))) {
>> -
>> -		prev_chain = to_dma_fence_chain(prev);
>> -		if (prev_chain) {
>> -			if (!dma_fence_is_signaled(prev_chain->fence))
>> -				break;
>> -
>> -			replacement = dma_fence_chain_get_prev(prev_chain);
>> -		} else {
>> -			if (!dma_fence_is_signaled(prev))
>> -				break;
>> -
>> -			replacement = NULL;
>> -		}
>> -
>> -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
>> -			      prev, replacement);
>> -		if (tmp == prev)
>> -			dma_fence_put(tmp);
>> -		else
>> -			dma_fence_put(replacement);
>> +		if (!dma_fence_chain_try_replace(chain, prev))
>> +			break;
>>   		dma_fence_put(prev);
>>   	}
>>   
>> @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
>>   	dma_fence_put(prev);
>>   
>>   	dma_fence_put(chain->fence);
>> -	dma_fence_free(fence);
>> +	WRITE_ONCE(chain->fence, NULL);
>> +
>> +	/*
>> +	 * Don't garbage collect here to avoid recursion and potential stack
>> +	 * overflow.
>> +	 */
>> +	chain = dma_fence_chain_dequeue();
>> +	if (!chain)
>> +		return;
>> +
>> +	if (kref_read(&chain->base.refcount) ||
>> +	    READ_ONCE(chain->fence))
>> +		dma_fence_chain_enqueue(chain);
>> +	else
>> +		dma_fence_free(&chain->base);
>>   }
>>   
>>   const struct dma_fence_ops dma_fence_chain_ops = {
>> @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
>>   };
>>   EXPORT_SYMBOL(dma_fence_chain_ops);
>>   
>> +/**
>> + * dma_fence_chain_garbage_collect - cleanup chain nodes
>> + *
>> + * Do some garbage collection and try to release chain nodes.
>> + */
>> +void dma_fence_chain_garbage_collect(void)
>> +{
>> +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
>> +
>> +	if (!chain)
>> +		return;
>> +
>> +	if (!dma_fence_get_rcu(&chain->base)) {
>> +		/* Unused, check if it's also clean */
>> +		if (likely(!READ_ONCE(chain->fence))) {
>> +			dma_fence_free(&chain->base);
>> +			return;
>> +		}
>> +
>> +	} else {
>> +		struct dma_fence *prev;
>> +
>> +		/* Used, do some chain walk */
>> +		prev = dma_fence_chain_get_prev(chain);
>> +		if (prev) {
>> +			dma_fence_chain_try_replace(chain, prev);
>> +			dma_fence_put(prev);
>> +		}
>> +		dma_fence_put(&chain->base);
>> +	}
>> +	dma_fence_chain_enqueue(chain);
>> +}
>> +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
>> +
>>   /**
>>    * dma_fence_chain_init - initialize a fence chain
>>    * @chain: the chain node to initialize
>> @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>>   
>>   	dma_fence_init(&chain->base, &dma_fence_chain_ops,
>>   		       &chain->lock, context, seqno);
>> +	dma_fence_chain_enqueue(chain);
>>   }
>> +
>>   EXPORT_SYMBOL(dma_fence_chain_init);
>> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
>> index 5f45689a6d2e..b412b5396434 100644
>> --- a/include/linux/dma-fence-chain.h
>> +++ b/include/linux/dma-fence-chain.h
>> @@ -19,6 +19,7 @@
>>    * @base: fence base class
>>    * @lock: spinlock for fence handling
>>    * @prev: previous fence of the chain
>> + * @next: next chain node for garbage collection
>>    * @prev_seqno: original previous seqno before garbage collection
>>    * @fence: encapsulated fence
>>    * @cb: callback structure for signaling
>> @@ -27,6 +28,7 @@
>>   struct dma_fence_chain {
>>   	struct dma_fence base;
>>   	struct dma_fence __rcu *prev;
>> +	struct dma_fence_chain __rcu *next;
>>   	u64 prev_seqno;
>>   	struct dma_fence *fence;
>>   	union {
>> @@ -38,6 +40,8 @@ struct dma_fence_chain {
>>   
>>   extern const struct dma_fence_ops dma_fence_chain_ops;
>>   
>> +void dma_fence_chain_garbage_collect(void);
>> +
>>   /**
>>    * to_dma_fence_chain - cast a fence to a dma_fence_chain
>>    * @fence: fence to cast to a dma_fence_array
>> @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
>>    */
>>   static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>>   {
>> +	dma_fence_chain_garbage_collect();
>>   	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>   };
>>   
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-11 10:07       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 10:58 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
>> Add some rather sophisticated lockless garbage collection
>> for dma_fence_chain objects.
>>
>> For this keep all initialized dma_fence_chain nodes an a
>> queue and trigger garbage collection before a new one is
>> allocated.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> Uh hand-rolled lockless list, I'm not a fan.
>
> But the real question here is why? This is a global list, so it's going to
> look great on your desktop, but gpus are for servers now and those are
> NUMA. So just from that pov doing garbage-collection individually still
> feels like a much better idea.

Yeah, I was pondering on that quite a bit as well. That why I added the 
multi threaded alloc/free test.

> So what's the problem your trying to solve here?

I was not sure if the chain is garbage collected enough when used for 
the dma_resv exclusive object.

But I think we could just drop this for now and just see how it goes.

Christian.

> -Daniel
>
>> ---
>>   drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
>>   include/linux/dma-fence-chain.h   |   5 +
>>   2 files changed, 142 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
>> index 1b4cb3e5cec9..c2f0b69eabb7 100644
>> --- a/drivers/dma-buf/dma-fence-chain.c
>> +++ b/drivers/dma-buf/dma-fence-chain.c
>> @@ -9,8 +9,53 @@
>>   
>>   #include <linux/dma-fence-chain.h>
>>   
>> +static struct dma_fence_chain __rcu *fifo_front;
>> +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
>> +
>>   static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>>   
>> +/**
>> + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
>> + * @chain: the chain node to enqueue
>> + *
>> + * Add the chain node to the end of the gc fifo.
>> + */
>> +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
>> +{
>> +	struct dma_fence_chain __rcu **tmp;
>> +
>> +	RCU_INIT_POINTER(chain->next, NULL);
>> +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
>> +		   &chain->next);
>> +
>> +	/* This is intentionally unordered since we only need the fifo for gc */
>> +	rcu_assign_pointer(*tmp, chain);
>> +}
>> +
>> +/**
>> + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
>> + *
>> + * Remove the first chain node from the gc fifo while making sure that always
>> + * keep at least one node on the fifo for lockless fifo implementation.
>> + * Returns the dequeued chain node or NULL.
>> + */
>> +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
>> +{
>> +	struct dma_fence_chain *chain, *tmp;
>> +
>> +	rcu_read_lock();
>> +	chain = rcu_dereference(fifo_front);
>> +	/* Never dequeue the last chain node for lockless fifo */
>> +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
>> +		rcu_read_unlock();
>> +		return NULL;
>> +	}
>> +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
>> +		      chain, rcu_access_pointer(chain->next));
>> +	rcu_read_unlock();
>> +	return tmp == chain ? chain : NULL;
>> +}
>> +
>>   /**
>>    * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
>>    * @chain: chain node to get the previous node from
>> @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>>   	return prev;
>>   }
>>   
>> +/**
>> + * dma_fence_chain_try_replace - try to replace the prev node
>> + * @chain: Chain node we try to replace prev.
>> + * @prev: the old prev node
>> + *
>> + * Try to replace the previous chain node when it or its containing fence is
>> + * signaled. Returns true if we tried, false if we need to wait.
>> + */
>> +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
>> +					struct dma_fence *prev)
>> +{
>> +	struct dma_fence *replacement, *tmp;
>> +	struct dma_fence_chain *prev_chain;
>> +
>> +	prev_chain = to_dma_fence_chain(prev);
>> +	if (prev_chain) {
>> +		if (!dma_fence_is_signaled(prev_chain->fence))
>> +			return false;
>> +
>> +		replacement = dma_fence_chain_get_prev(prev_chain);
>> +	} else {
>> +		if (!dma_fence_is_signaled(prev))
>> +			return false;
>> +
>> +		replacement = NULL;
>> +	}
>> +
>> +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
>> +		      replacement);
>> +	if (tmp == prev)
>> +		dma_fence_put(tmp);
>> +	else
>> +		dma_fence_put(replacement);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * dma_fence_chain_walk - chain walking function
>>    * @fence: current chain node
>> @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>>    */
>>   struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>>   {
>> -	struct dma_fence_chain *chain, *prev_chain;
>> -	struct dma_fence *prev, *replacement, *tmp;
>> +	struct dma_fence_chain *chain;
>> +	struct dma_fence *prev;
>>   
>>   	chain = to_dma_fence_chain(fence);
>>   	if (!chain) {
>> @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>>   	}
>>   
>>   	while ((prev = dma_fence_chain_get_prev(chain))) {
>> -
>> -		prev_chain = to_dma_fence_chain(prev);
>> -		if (prev_chain) {
>> -			if (!dma_fence_is_signaled(prev_chain->fence))
>> -				break;
>> -
>> -			replacement = dma_fence_chain_get_prev(prev_chain);
>> -		} else {
>> -			if (!dma_fence_is_signaled(prev))
>> -				break;
>> -
>> -			replacement = NULL;
>> -		}
>> -
>> -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
>> -			      prev, replacement);
>> -		if (tmp == prev)
>> -			dma_fence_put(tmp);
>> -		else
>> -			dma_fence_put(replacement);
>> +		if (!dma_fence_chain_try_replace(chain, prev))
>> +			break;
>>   		dma_fence_put(prev);
>>   	}
>>   
>> @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
>>   	dma_fence_put(prev);
>>   
>>   	dma_fence_put(chain->fence);
>> -	dma_fence_free(fence);
>> +	WRITE_ONCE(chain->fence, NULL);
>> +
>> +	/*
>> +	 * Don't garbage collect here to avoid recursion and potential stack
>> +	 * overflow.
>> +	 */
>> +	chain = dma_fence_chain_dequeue();
>> +	if (!chain)
>> +		return;
>> +
>> +	if (kref_read(&chain->base.refcount) ||
>> +	    READ_ONCE(chain->fence))
>> +		dma_fence_chain_enqueue(chain);
>> +	else
>> +		dma_fence_free(&chain->base);
>>   }
>>   
>>   const struct dma_fence_ops dma_fence_chain_ops = {
>> @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
>>   };
>>   EXPORT_SYMBOL(dma_fence_chain_ops);
>>   
>> +/**
>> + * dma_fence_chain_garbage_collect - cleanup chain nodes
>> + *
>> + * Do some garbage collection and try to release chain nodes.
>> + */
>> +void dma_fence_chain_garbage_collect(void)
>> +{
>> +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
>> +
>> +	if (!chain)
>> +		return;
>> +
>> +	if (!dma_fence_get_rcu(&chain->base)) {
>> +		/* Unused, check if it's also clean */
>> +		if (likely(!READ_ONCE(chain->fence))) {
>> +			dma_fence_free(&chain->base);
>> +			return;
>> +		}
>> +
>> +	} else {
>> +		struct dma_fence *prev;
>> +
>> +		/* Used, do some chain walk */
>> +		prev = dma_fence_chain_get_prev(chain);
>> +		if (prev) {
>> +			dma_fence_chain_try_replace(chain, prev);
>> +			dma_fence_put(prev);
>> +		}
>> +		dma_fence_put(&chain->base);
>> +	}
>> +	dma_fence_chain_enqueue(chain);
>> +}
>> +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
>> +
>>   /**
>>    * dma_fence_chain_init - initialize a fence chain
>>    * @chain: the chain node to initialize
>> @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
>>   
>>   	dma_fence_init(&chain->base, &dma_fence_chain_ops,
>>   		       &chain->lock, context, seqno);
>> +	dma_fence_chain_enqueue(chain);
>>   }
>> +
>>   EXPORT_SYMBOL(dma_fence_chain_init);
>> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
>> index 5f45689a6d2e..b412b5396434 100644
>> --- a/include/linux/dma-fence-chain.h
>> +++ b/include/linux/dma-fence-chain.h
>> @@ -19,6 +19,7 @@
>>    * @base: fence base class
>>    * @lock: spinlock for fence handling
>>    * @prev: previous fence of the chain
>> + * @next: next chain node for garbage collection
>>    * @prev_seqno: original previous seqno before garbage collection
>>    * @fence: encapsulated fence
>>    * @cb: callback structure for signaling
>> @@ -27,6 +28,7 @@
>>   struct dma_fence_chain {
>>   	struct dma_fence base;
>>   	struct dma_fence __rcu *prev;
>> +	struct dma_fence_chain __rcu *next;
>>   	u64 prev_seqno;
>>   	struct dma_fence *fence;
>>   	union {
>> @@ -38,6 +40,8 @@ struct dma_fence_chain {
>>   
>>   extern const struct dma_fence_ops dma_fence_chain_ops;
>>   
>> +void dma_fence_chain_garbage_collect(void);
>> +
>>   /**
>>    * to_dma_fence_chain - cast a fence to a dma_fence_chain
>>    * @fence: fence to cast to a dma_fence_array
>> @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
>>    */
>>   static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>>   {
>> +	dma_fence_chain_garbage_collect();
>>   	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>   };
>>   
>> -- 
>> 2.25.1
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-11  9:07     ` Daniel Vetter
@ 2021-06-11 10:09       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
>> Unwrap a the explicit fence if it is a dma_fence_chain and
>> sync to the first fence not matching the owner rules.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>>   1 file changed, 68 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index 1b2ceccaf5b0..862eb3c1c4c5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -28,6 +28,8 @@
>>    *    Christian König <christian.koenig@amd.com>
>>    */
>>   
>> +#include <linux/dma-fence-chain.h>
>> +
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>>   #include "amdgpu_amdkfd.h"
>> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>>   	return amdgpu_sync_fence(sync, fence);
>>   }
>>   
>> +/* Determine based on the owner and mode if we should sync to a fence or not */
>> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
>> +				   enum amdgpu_sync_mode mode,
>> +				   void *owner, struct dma_fence *f)
>> +{
>> +	void *fence_owner = amdgpu_sync_get_owner(f);
>> +
>> +	/* Always sync to moves, no matter what */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return true;
>> +
>> +	/* We only want to trigger KFD eviction fences on
>> +	 * evict or move jobs. Skip KFD fences otherwise.
>> +	 */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return false;
>> +
>> +	/* Never sync to VM updates either. */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return false;
>> +
>> +	/* Ignore fences depending on the sync mode */
>> +	switch (mode) {
>> +	case AMDGPU_SYNC_ALWAYS:
>> +		return true;
>> +
>> +	case AMDGPU_SYNC_NE_OWNER:
>> +		if (amdgpu_sync_same_dev(adev, f) &&
>> +		    fence_owner == owner)
>> +			return false;
>> +		break;
>> +
>> +	case AMDGPU_SYNC_EQ_OWNER:
>> +		if (amdgpu_sync_same_dev(adev, f) &&
>> +		    fence_owner != owner)
>> +			return false;
>> +		break;
>> +
>> +	case AMDGPU_SYNC_EXPLICIT:
>> +		return false;
>> +	}
>> +
>> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>> +	     "Adding eviction fence to sync obj");
>> +	return true;
>> +}
>> +
>>   /**
>>    * amdgpu_sync_resv - sync to a reservation object
>>    *
>> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>>   
>>   	/* always sync to the exclusive fence */
>>   	f = dma_resv_excl_fence(resv);
>> -	r = amdgpu_sync_fence(sync, f);
>> +	dma_fence_chain_for_each(f, f) {
> Jason has some helper for deep-walking fence chains/arrays here I think.
> Might want to look into that, so that we have some consistency in how we
> pile up multiple exclusive fences.

Well those helpers are not from Jason, but from me :)

But no, for now the deep inspection is not really helpful here since 
grabbing a reference to a certain chain node is what that makes the 
handling easier and faster here.

Thinking more about it that should also make it possible for the garbage 
collection to kick in properly.

>
> Anyway pretty much one of the versions I had in mind too, except I didn't
> type it up.
>
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Thanks,
Christian.

>
>> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
>> +
>> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
>> +					   chain->fence : f)) {
>> +			r = amdgpu_sync_fence(sync, f);
>> +			dma_fence_put(f);
>> +			if (r)
>> +				return r;
>> +			break;
>> +		}
>> +	}
>>   
>>   	flist = dma_resv_shared_list(resv);
>> -	if (!flist || r)
>> -		return r;
>> +	if (!flist)
>> +		return 0;
>>   
>>   	for (i = 0; i < flist->shared_count; ++i) {
>> -		void *fence_owner;
>> -
>>   		f = rcu_dereference_protected(flist->shared[i],
>>   					      dma_resv_held(resv));
>>   
>> -		fence_owner = amdgpu_sync_get_owner(f);
>> -
>> -		/* Always sync to moves, no matter what */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
>> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>>   			r = amdgpu_sync_fence(sync, f);
>>   			if (r)
>> -				break;
>> -		}
>> -
>> -		/* We only want to trigger KFD eviction fences on
>> -		 * evict or move jobs. Skip KFD fences otherwise.
>> -		 */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> -			continue;
>> -
>> -		/* Never sync to VM updates either. */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> -			continue;
>> -
>> -		/* Ignore fences depending on the sync mode */
>> -		switch (mode) {
>> -		case AMDGPU_SYNC_ALWAYS:
>> -			break;
>> -
>> -		case AMDGPU_SYNC_NE_OWNER:
>> -			if (amdgpu_sync_same_dev(adev, f) &&
>> -			    fence_owner == owner)
>> -				continue;
>> -			break;
>> -
>> -		case AMDGPU_SYNC_EQ_OWNER:
>> -			if (amdgpu_sync_same_dev(adev, f) &&
>> -			    fence_owner != owner)
>> -				continue;
>> -			break;
>> -
>> -		case AMDGPU_SYNC_EXPLICIT:
>> -			continue;
>> +				return r;
>>   		}
>> -
>> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>> -		     "Adding eviction fence to sync obj");
>> -		r = amdgpu_sync_fence(sync, f);
>> -		if (r)
>> -			break;
>>   	}
>> -	return r;
>> +	return 0;
>>   }
>>   
>>   /**
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-11 10:09       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
>> Unwrap a the explicit fence if it is a dma_fence_chain and
>> sync to the first fence not matching the owner rules.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>>   1 file changed, 68 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index 1b2ceccaf5b0..862eb3c1c4c5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -28,6 +28,8 @@
>>    *    Christian König <christian.koenig@amd.com>
>>    */
>>   
>> +#include <linux/dma-fence-chain.h>
>> +
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>>   #include "amdgpu_amdkfd.h"
>> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>>   	return amdgpu_sync_fence(sync, fence);
>>   }
>>   
>> +/* Determine based on the owner and mode if we should sync to a fence or not */
>> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
>> +				   enum amdgpu_sync_mode mode,
>> +				   void *owner, struct dma_fence *f)
>> +{
>> +	void *fence_owner = amdgpu_sync_get_owner(f);
>> +
>> +	/* Always sync to moves, no matter what */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return true;
>> +
>> +	/* We only want to trigger KFD eviction fences on
>> +	 * evict or move jobs. Skip KFD fences otherwise.
>> +	 */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return false;
>> +
>> +	/* Never sync to VM updates either. */
>> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> +		return false;
>> +
>> +	/* Ignore fences depending on the sync mode */
>> +	switch (mode) {
>> +	case AMDGPU_SYNC_ALWAYS:
>> +		return true;
>> +
>> +	case AMDGPU_SYNC_NE_OWNER:
>> +		if (amdgpu_sync_same_dev(adev, f) &&
>> +		    fence_owner == owner)
>> +			return false;
>> +		break;
>> +
>> +	case AMDGPU_SYNC_EQ_OWNER:
>> +		if (amdgpu_sync_same_dev(adev, f) &&
>> +		    fence_owner != owner)
>> +			return false;
>> +		break;
>> +
>> +	case AMDGPU_SYNC_EXPLICIT:
>> +		return false;
>> +	}
>> +
>> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>> +	     "Adding eviction fence to sync obj");
>> +	return true;
>> +}
>> +
>>   /**
>>    * amdgpu_sync_resv - sync to a reservation object
>>    *
>> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>>   
>>   	/* always sync to the exclusive fence */
>>   	f = dma_resv_excl_fence(resv);
>> -	r = amdgpu_sync_fence(sync, f);
>> +	dma_fence_chain_for_each(f, f) {
> Jason has some helper for deep-walking fence chains/arrays here I think.
> Might want to look into that, so that we have some consistency in how we
> pile up multiple exclusive fences.

Well those helpers are not from Jason, but from me :)

But no, for now the deep inspection is not really helpful here since 
grabbing a reference to a certain chain node is what that makes the 
handling easier and faster here.

Thinking more about it that should also make it possible for the garbage 
collection to kick in properly.

>
> Anyway pretty much one of the versions I had in mind too, except I didn't
> type it up.
>
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Thanks,
Christian.

>
>> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
>> +
>> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
>> +					   chain->fence : f)) {
>> +			r = amdgpu_sync_fence(sync, f);
>> +			dma_fence_put(f);
>> +			if (r)
>> +				return r;
>> +			break;
>> +		}
>> +	}
>>   
>>   	flist = dma_resv_shared_list(resv);
>> -	if (!flist || r)
>> -		return r;
>> +	if (!flist)
>> +		return 0;
>>   
>>   	for (i = 0; i < flist->shared_count; ++i) {
>> -		void *fence_owner;
>> -
>>   		f = rcu_dereference_protected(flist->shared[i],
>>   					      dma_resv_held(resv));
>>   
>> -		fence_owner = amdgpu_sync_get_owner(f);
>> -
>> -		/* Always sync to moves, no matter what */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
>> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>>   			r = amdgpu_sync_fence(sync, f);
>>   			if (r)
>> -				break;
>> -		}
>> -
>> -		/* We only want to trigger KFD eviction fences on
>> -		 * evict or move jobs. Skip KFD fences otherwise.
>> -		 */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> -			continue;
>> -
>> -		/* Never sync to VM updates either. */
>> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> -			continue;
>> -
>> -		/* Ignore fences depending on the sync mode */
>> -		switch (mode) {
>> -		case AMDGPU_SYNC_ALWAYS:
>> -			break;
>> -
>> -		case AMDGPU_SYNC_NE_OWNER:
>> -			if (amdgpu_sync_same_dev(adev, f) &&
>> -			    fence_owner == owner)
>> -				continue;
>> -			break;
>> -
>> -		case AMDGPU_SYNC_EQ_OWNER:
>> -			if (amdgpu_sync_same_dev(adev, f) &&
>> -			    fence_owner != owner)
>> -				continue;
>> -			break;
>> -
>> -		case AMDGPU_SYNC_EXPLICIT:
>> -			continue;
>> +				return r;
>>   		}
>> -
>> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>> -		     "Adding eviction fence to sync obj");
>> -		r = amdgpu_sync_fence(sync, f);
>> -		if (r)
>> -			break;
>>   	}
>> -	return r;
>> +	return 0;
>>   }
>>   
>>   /**
>> -- 
>> 2.25.1
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
  2021-06-11  9:17     ` Daniel Vetter
@ 2021-06-11 10:12       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 11:17 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
>> Drop the workaround and instead implement a better solution.
>>
>> Basically we are now chaining all submissions using a dma_fence_chain
>> container and adding them as exclusive fence to the dma_resv object.
>>
>> This way other drivers can still sync to the single exclusive fence
>> while amdgpu only sync to fences from different processes.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>>   6 files changed, 47 insertions(+), 79 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> index a130e766cbdb..c905a4cfc173 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>>   struct amdgpu_bo_list_entry {
>>   	struct ttm_validate_buffer	tv;
>>   	struct amdgpu_bo_va		*bo_va;
>> +	struct dma_fence_chain		*chain;
>>   	uint32_t			priority;
>>   	struct page			**user_pages;
>>   	bool				user_invalidated;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 325e82621467..f6f3029f958d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   		goto out;
>>   	}
>>   
>> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> +
>> +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
>> +
>> +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
>> +			e->chain = dma_fence_chain_alloc();
>> +			if (!e->chain) {
>> +				r = -ENOMEM;
>> +				goto error_validate;
>> +			}
>> +		}
>> +	}
>> +
>>   	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
>>   					  &p->bytes_moved_vis_threshold);
>>   	p->bytes_moved = 0;
>> @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   	gws = p->bo_list->gws_obj;
>>   	oa = p->bo_list->oa_obj;
>>   
>> -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> -
>> -		/* Make sure we use the exclusive slot for shared BOs */
>> -		if (bo->prime_shared_count)
>> -			e->tv.num_shared = 0;
>> -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
>> -	}
>> -
>>   	if (gds) {
>>   		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>>   		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
>> @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   	}
>>   
>>   error_validate:
>> -	if (r)
>> +	if (r) {
>> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +			dma_fence_chain_free(e->chain);
>> +			e->chain = NULL;
>> +		}
>>   		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
>> +	}
>>   out:
>>   	return r;
>>   }
>> @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
>>   {
>>   	unsigned i;
>>   
>> -	if (error && backoff)
>> +	if (error && backoff) {
>> +		struct amdgpu_bo_list_entry *e;
>> +
>> +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
>> +			dma_fence_chain_free(e->chain);
>> +			e->chain = NULL;
>> +		}
>> +
>>   		ttm_eu_backoff_reservation(&parser->ticket,
>>   					   &parser->validated);
>> +	}
>>   
>>   	for (i = 0; i < parser->num_post_deps; i++) {
>>   		drm_syncobj_put(parser->post_deps[i].syncobj);
>> @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>   
>>   	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
>>   
>> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +		struct dma_resv *resv = e->tv.bo->base.resv;
>> +		struct dma_fence_chain *chain = e->chain;
>> +
>> +		if (!chain)
>> +			continue;
>> +
>> +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
>> +				     dma_fence_get(p->fence), 1);
>> +
>> +		rcu_assign_pointer(resv->fence_excl, &chain->base);
> So for safety since this is now driver interface I was thinking off a
> helper which does this entire dance and _also_ adds the new fence to the
> shared slots. This way we don't let drivers open-code this rather tricky
> operation.

Well I only see this as a workaround for amdgpu and don't want to leak 
it into other drivers.

If somebody else wants to adopt it we should probably consider fixing 
the dma_resv object instead.

> You probably also want to then combine this with
> ttm_eu_fence_buffer_objects below I think so you're not walking this list
> 2x.
>
> Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
> then let ttm_eu_fence_buffer_objects() do this, by calling a new
> dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
> sync_file import will use. We want to hide the inner workings of dma_resv
> as much as possible ofc.
>
> The other thing I'm wondering is whether this needs to be wrapped in a
> single seqlock or whether we don't have ordering issues if we split up the
> update here? Setting the exclusive fence before we also added it to the
> shared slot can at least violate the "shared fences signal after exclusive
> if both are present"

Uff, good point.

> Finally I guess need to sprinkle the manual garbage collector somehwere
> here too.

That is done automatically when somebody iterates the chain node.

Christian.

>
> But aside from the interface polish and correctness against races I think
> this looks solid in the big picture.
>
> Cheers, Daniel
>
>> +		e->chain = NULL;
>> +	}
>> +
>>   	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>   	mutex_unlock(&p->adev->notifier_lock);
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> index c3053b83b80c..23219fc3b28c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> @@ -42,48 +42,6 @@
>>   #include <linux/pci-p2pdma.h>
>>   #include <linux/pm_runtime.h>
>>   
>> -static int
>> -__dma_resv_make_exclusive(struct dma_resv *obj)
>> -{
>> -	struct dma_fence **fences;
>> -	unsigned int count;
>> -	int r;
>> -
>> -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
>> -		return 0;
>> -
>> -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
>> -	if (r)
>> -		return r;
>> -
>> -	if (count == 0) {
>> -		/* Now that was unexpected. */
>> -	} else if (count == 1) {
>> -		dma_resv_add_excl_fence(obj, fences[0]);
>> -		dma_fence_put(fences[0]);
>> -		kfree(fences);
>> -	} else {
>> -		struct dma_fence_array *array;
>> -
>> -		array = dma_fence_array_create(count, fences,
>> -					       dma_fence_context_alloc(1), 0,
>> -					       false);
>> -		if (!array)
>> -			goto err_fences_put;
>> -
>> -		dma_resv_add_excl_fence(obj, &array->base);
>> -		dma_fence_put(&array->base);
>> -	}
>> -
>> -	return 0;
>> -
>> -err_fences_put:
>> -	while (count--)
>> -		dma_fence_put(fences[count]);
>> -	kfree(fences);
>> -	return -ENOMEM;
>> -}
>> -
>>   /**
>>    * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
>>    *
>> @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>>   	if (r < 0)
>>   		goto out;
>>   
>> -	r = amdgpu_bo_reserve(bo, false);
>> -	if (unlikely(r != 0))
>> -		goto out;
>> -
>> -	/*
>> -	 * We only create shared fences for internal use, but importers
>> -	 * of the dmabuf rely on exclusive fences for implicitly
>> -	 * tracking write hazards. As any of the current fences may
>> -	 * correspond to a write, we need to convert all existing
>> -	 * fences on the reservation object into a single exclusive
>> -	 * fence.
>> -	 */
>> -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
>> -	if (r)
>> -		goto out;
>> -
>> -	bo->prime_shared_count++;
>> -	amdgpu_bo_unreserve(bo);
>>   	return 0;
>>   
>>   out:
>> @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
>>   	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>   
>> -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
>> -		bo->prime_shared_count--;
>> -
>>   	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>>   	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
>>   }
>> @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
>>   	bo = gem_to_amdgpu_bo(gobj);
>>   	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
>>   	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
>> -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
>> -		bo->prime_shared_count = 1;
>>   
>>   	dma_resv_unlock(resv);
>>   	return gobj;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 1c3e3b608332..5d82120fc160 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>>   		break;
>>   	}
>>   	case AMDGPU_GEM_OP_SET_PLACEMENT:
>> -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
>> +		if (robj->tbo.base.import_attach &&
>> +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
>>   			r = -EINVAL;
>>   			amdgpu_bo_unreserve(robj);
>>   			break;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 96447e1d4c9c..30951b593809 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
>>   		return -EINVAL;
>>   
>>   	/* A shared bo cannot be migrated to VRAM */
>> -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
>> +	if (bo->tbo.base.import_attach) {
>>   		if (domain & AMDGPU_GEM_DOMAIN_GTT)
>>   			domain = AMDGPU_GEM_DOMAIN_GTT;
>>   		else
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index b35962702278..55d7e90eaa72 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -98,7 +98,6 @@ struct amdgpu_bo {
>>   	struct ttm_buffer_object	tbo;
>>   	struct ttm_bo_kmap_obj		kmap;
>>   	u64				flags;
>> -	unsigned			prime_shared_count;
>>   	/* per VM structure for page tables and with virtual addresses */
>>   	struct amdgpu_vm_bo_base	*vm_bo;
>>   	/* Constant after initialization */
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
@ 2021-06-11 10:12       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 10:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 11:17 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
>> Drop the workaround and instead implement a better solution.
>>
>> Basically we are now chaining all submissions using a dma_fence_chain
>> container and adding them as exclusive fence to the dma_resv object.
>>
>> This way other drivers can still sync to the single exclusive fence
>> while amdgpu only sync to fences from different processes.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>>   6 files changed, 47 insertions(+), 79 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> index a130e766cbdb..c905a4cfc173 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>>   struct amdgpu_bo_list_entry {
>>   	struct ttm_validate_buffer	tv;
>>   	struct amdgpu_bo_va		*bo_va;
>> +	struct dma_fence_chain		*chain;
>>   	uint32_t			priority;
>>   	struct page			**user_pages;
>>   	bool				user_invalidated;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 325e82621467..f6f3029f958d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   		goto out;
>>   	}
>>   
>> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> +
>> +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
>> +
>> +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
>> +			e->chain = dma_fence_chain_alloc();
>> +			if (!e->chain) {
>> +				r = -ENOMEM;
>> +				goto error_validate;
>> +			}
>> +		}
>> +	}
>> +
>>   	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
>>   					  &p->bytes_moved_vis_threshold);
>>   	p->bytes_moved = 0;
>> @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   	gws = p->bo_list->gws_obj;
>>   	oa = p->bo_list->oa_obj;
>>   
>> -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>> -
>> -		/* Make sure we use the exclusive slot for shared BOs */
>> -		if (bo->prime_shared_count)
>> -			e->tv.num_shared = 0;
>> -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
>> -	}
>> -
>>   	if (gds) {
>>   		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>>   		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
>> @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>   	}
>>   
>>   error_validate:
>> -	if (r)
>> +	if (r) {
>> +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +			dma_fence_chain_free(e->chain);
>> +			e->chain = NULL;
>> +		}
>>   		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
>> +	}
>>   out:
>>   	return r;
>>   }
>> @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
>>   {
>>   	unsigned i;
>>   
>> -	if (error && backoff)
>> +	if (error && backoff) {
>> +		struct amdgpu_bo_list_entry *e;
>> +
>> +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
>> +			dma_fence_chain_free(e->chain);
>> +			e->chain = NULL;
>> +		}
>> +
>>   		ttm_eu_backoff_reservation(&parser->ticket,
>>   					   &parser->validated);
>> +	}
>>   
>>   	for (i = 0; i < parser->num_post_deps; i++) {
>>   		drm_syncobj_put(parser->post_deps[i].syncobj);
>> @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>   
>>   	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
>>   
>> +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> +		struct dma_resv *resv = e->tv.bo->base.resv;
>> +		struct dma_fence_chain *chain = e->chain;
>> +
>> +		if (!chain)
>> +			continue;
>> +
>> +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
>> +				     dma_fence_get(p->fence), 1);
>> +
>> +		rcu_assign_pointer(resv->fence_excl, &chain->base);
> So for safety since this is now driver interface I was thinking off a
> helper which does this entire dance and _also_ adds the new fence to the
> shared slots. This way we don't let drivers open-code this rather tricky
> operation.

Well I only see this as a workaround for amdgpu and don't want to leak 
it into other drivers.

If somebody else wants to adopt it we should probably consider fixing 
the dma_resv object instead.

> You probably also want to then combine this with
> ttm_eu_fence_buffer_objects below I think so you're not walking this list
> 2x.
>
> Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
> then let ttm_eu_fence_buffer_objects() do this, by calling a new
> dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
> sync_file import will use. We want to hide the inner workings of dma_resv
> as much as possible ofc.
>
> The other thing I'm wondering is whether this needs to be wrapped in a
> single seqlock or whether we don't have ordering issues if we split up the
> update here? Setting the exclusive fence before we also added it to the
> shared slot can at least violate the "shared fences signal after exclusive
> if both are present"

Uff, good point.

> Finally I guess need to sprinkle the manual garbage collector somehwere
> here too.

That is done automatically when somebody iterates the chain node.

Christian.

>
> But aside from the interface polish and correctness against races I think
> this looks solid in the big picture.
>
> Cheers, Daniel
>
>> +		e->chain = NULL;
>> +	}
>> +
>>   	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>   	mutex_unlock(&p->adev->notifier_lock);
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> index c3053b83b80c..23219fc3b28c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> @@ -42,48 +42,6 @@
>>   #include <linux/pci-p2pdma.h>
>>   #include <linux/pm_runtime.h>
>>   
>> -static int
>> -__dma_resv_make_exclusive(struct dma_resv *obj)
>> -{
>> -	struct dma_fence **fences;
>> -	unsigned int count;
>> -	int r;
>> -
>> -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
>> -		return 0;
>> -
>> -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
>> -	if (r)
>> -		return r;
>> -
>> -	if (count == 0) {
>> -		/* Now that was unexpected. */
>> -	} else if (count == 1) {
>> -		dma_resv_add_excl_fence(obj, fences[0]);
>> -		dma_fence_put(fences[0]);
>> -		kfree(fences);
>> -	} else {
>> -		struct dma_fence_array *array;
>> -
>> -		array = dma_fence_array_create(count, fences,
>> -					       dma_fence_context_alloc(1), 0,
>> -					       false);
>> -		if (!array)
>> -			goto err_fences_put;
>> -
>> -		dma_resv_add_excl_fence(obj, &array->base);
>> -		dma_fence_put(&array->base);
>> -	}
>> -
>> -	return 0;
>> -
>> -err_fences_put:
>> -	while (count--)
>> -		dma_fence_put(fences[count]);
>> -	kfree(fences);
>> -	return -ENOMEM;
>> -}
>> -
>>   /**
>>    * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
>>    *
>> @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>>   	if (r < 0)
>>   		goto out;
>>   
>> -	r = amdgpu_bo_reserve(bo, false);
>> -	if (unlikely(r != 0))
>> -		goto out;
>> -
>> -	/*
>> -	 * We only create shared fences for internal use, but importers
>> -	 * of the dmabuf rely on exclusive fences for implicitly
>> -	 * tracking write hazards. As any of the current fences may
>> -	 * correspond to a write, we need to convert all existing
>> -	 * fences on the reservation object into a single exclusive
>> -	 * fence.
>> -	 */
>> -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
>> -	if (r)
>> -		goto out;
>> -
>> -	bo->prime_shared_count++;
>> -	amdgpu_bo_unreserve(bo);
>>   	return 0;
>>   
>>   out:
>> @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
>>   	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>   
>> -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
>> -		bo->prime_shared_count--;
>> -
>>   	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
>>   	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
>>   }
>> @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
>>   	bo = gem_to_amdgpu_bo(gobj);
>>   	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
>>   	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
>> -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
>> -		bo->prime_shared_count = 1;
>>   
>>   	dma_resv_unlock(resv);
>>   	return gobj;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 1c3e3b608332..5d82120fc160 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>>   		break;
>>   	}
>>   	case AMDGPU_GEM_OP_SET_PLACEMENT:
>> -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
>> +		if (robj->tbo.base.import_attach &&
>> +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
>>   			r = -EINVAL;
>>   			amdgpu_bo_unreserve(robj);
>>   			break;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 96447e1d4c9c..30951b593809 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
>>   		return -EINVAL;
>>   
>>   	/* A shared bo cannot be migrated to VRAM */
>> -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
>> +	if (bo->tbo.base.import_attach) {
>>   		if (domain & AMDGPU_GEM_DOMAIN_GTT)
>>   			domain = AMDGPU_GEM_DOMAIN_GTT;
>>   		else
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index b35962702278..55d7e90eaa72 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -98,7 +98,6 @@ struct amdgpu_bo {
>>   	struct ttm_buffer_object	tbo;
>>   	struct ttm_bo_kmap_obj		kmap;
>>   	u64				flags;
>> -	unsigned			prime_shared_count;
>>   	/* per VM structure for page tables and with virtual addresses */
>>   	struct amdgpu_vm_bo_base	*vm_bo;
>>   	/* Constant after initialization */
>> -- 
>> 2.25.1
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
  2021-06-11  7:54     ` Daniel Vetter
@ 2021-06-11 11:48       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 11:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 09:54 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
>> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
>> and some unused code in the selftest.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
>>   drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
>>   drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
>>   include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
>>   6 files changed, 35 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
>> index 9525f7f56119..8ce1ea59d31b 100644
>> --- a/drivers/dma-buf/st-dma-fence-chain.c
>> +++ b/drivers/dma-buf/st-dma-fence-chain.c
>> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
>>   	return &f->base;
>>   }
>>   
>> -static inline struct mock_chain {
>> -	struct dma_fence_chain base;
>> -} *to_mock_chain(struct dma_fence *f) {
>> -	return container_of(f, struct mock_chain, base.base);
>> -}
>> -
>>   static struct dma_fence *mock_chain(struct dma_fence *prev,
>>   				    struct dma_fence *fence,
>>   				    u64 seqno)
>>   {
>> -	struct mock_chain *f;
>> +	struct dma_fence_chain *f;
>>   
>> -	f = kmalloc(sizeof(*f), GFP_KERNEL);
>> +	f = dma_fence_chain_alloc();
>>   	if (!f)
>>   		return NULL;
>>   
>> -	dma_fence_chain_init(&f->base,
>> -			     dma_fence_get(prev),
>> -			     dma_fence_get(fence),
>> +	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
>>   			     seqno);
>>   
>> -	return &f->base.base;
>> +	return &f->base;
>>   }
>>   
>>   static int sanitycheck(void *arg)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 90136f9dedd6..325e82621467 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>>   
>>   		dep->chain = NULL;
>>   		if (syncobj_deps[i].point) {
>> -			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
>> +			dep->chain = dma_fence_chain_alloc();
>>   			if (!dep->chain)
>>   				return -ENOMEM;
>>   		}
>> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>>   		dep->syncobj = drm_syncobj_find(p->filp,
>>   						syncobj_deps[i].handle);
>>   		if (!dep->syncobj) {
>> -			kfree(dep->chain);
>> +			dma_fence_chain_free(dep->chain);
>>   			return -EINVAL;
>>   		}
>>   		dep->point = syncobj_deps[i].point;
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index fdd2ec87cdd1..1c5b9ef6da37 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>   				     &fence);
>>   	if (ret)
>>   		goto err;
>> -	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +	chain = dma_fence_chain_alloc();
>>   	if (!chain) {
>>   		ret = -ENOMEM;
>>   		goto err1;
>> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>   		goto err_points;
>>   	}
>>   	for (i = 0; i < args->count_handles; i++) {
>> -		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +		chains[i] = dma_fence_chain_alloc();
>>   		if (!chains[i]) {
>>   			for (j = 0; j < i; j++)
>> -				kfree(chains[j]);
>> +				dma_fence_chain_free(chains[j]);
>>   			ret = -ENOMEM;
>>   			goto err_chains;
>>   		}
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 66789111a24b..a22cb86730b3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
>>   	while (n--) {
>>   		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>   		dma_fence_put(fences[n].dma_fence);
>> -		kfree(fences[n].chain_fence);
>> +		dma_fence_chain_free(fences[n].chain_fence);
>>   	}
>>   	kvfree(fences);
>>   }
>> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>>   				return -EINVAL;
>>   			}
>>   
>> -			f->chain_fence =
>> -				kmalloc(sizeof(*f->chain_fence),
>> -					GFP_KERNEL);
>> +			f->chain_fence = dma_fence_chain_alloc();
>>   			if (!f->chain_fence) {
>>   				drm_syncobj_put(syncobj);
>>   				dma_fence_put(fence);
>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
>> index 5480852bdeda..6ff6df6c4791 100644
>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
>> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>>   				break;
>>   			}
>>   
>> -			post_deps[i].chain =
>> -				kmalloc(sizeof(*post_deps[i].chain),
>> -				        GFP_KERNEL);
>> +			post_deps[i].chain = dma_fence_chain_alloc();
>>   			if (!post_deps[i].chain) {
>>   				ret = -ENOMEM;
>>   				break;
>> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>>   
>>   	if (ret) {
>>   		for (j = 0; j <= i; ++j) {
>> -			kfree(post_deps[j].chain);
>> +			dma_fence_chain_free(post_deps[j].chain);
>>   			if (post_deps[j].syncobj)
>>   				drm_syncobj_put(post_deps[j].syncobj);
>>   		}
>> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
>> index 9d6a062be640..5f45689a6d2e 100644
>> --- a/include/linux/dma-fence-chain.h
>> +++ b/include/linux/dma-fence-chain.h
>> @@ -12,6 +12,7 @@
>>   
>>   #include <linux/dma-fence.h>
>>   #include <linux/irq_work.h>
>> +#include <linux/slab.h>
>>   
>>   /**
>>    * struct dma_fence_chain - fence to represent an node of a fence chain
>> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
>>   	return container_of(fence, struct dma_fence_chain, base);
>>   }
>>   
>> +/**
>> + * dma_fence_chain_alloc
>> + *
>> + * Returns a new dma_fence_chain object
> ... or NULL on failure.
>
>> + */
>> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>> +{
>> +	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +};
>> +
>> +/**
>> + * dma_fence_chain_free
>> + * @chain: chain node to free
>> + *
>> + * Frees up an allocated but not used dma_fence_chain node.
>> + */
>> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
>> +{
>> +	kfree(chain);
> kfree_rcu, and I guess this means this patch here should be cc: stable.

Nope, kfree() is correct here.

This is to free up fences which never been initialized, so an RCU grace 
period isn't necessary because nobody could potentially have a reference.

Christian.

> This is kinda why I'm questioning whether this "dma_fence are protected by
> rcu" cross driver api is really a good idea. We largely get it wrong in
> the details in a _ton_ of places.
>
> With the details fixed:
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
>> +
>>   /**
>>    * dma_fence_chain_for_each - iterate over all fences in chain
>>    * @iter: current fence
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
@ 2021-06-11 11:48       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-11 11:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 09:54 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
>> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
>> and some unused code in the selftest.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
>>   drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
>>   drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
>>   include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
>>   6 files changed, 35 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
>> index 9525f7f56119..8ce1ea59d31b 100644
>> --- a/drivers/dma-buf/st-dma-fence-chain.c
>> +++ b/drivers/dma-buf/st-dma-fence-chain.c
>> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
>>   	return &f->base;
>>   }
>>   
>> -static inline struct mock_chain {
>> -	struct dma_fence_chain base;
>> -} *to_mock_chain(struct dma_fence *f) {
>> -	return container_of(f, struct mock_chain, base.base);
>> -}
>> -
>>   static struct dma_fence *mock_chain(struct dma_fence *prev,
>>   				    struct dma_fence *fence,
>>   				    u64 seqno)
>>   {
>> -	struct mock_chain *f;
>> +	struct dma_fence_chain *f;
>>   
>> -	f = kmalloc(sizeof(*f), GFP_KERNEL);
>> +	f = dma_fence_chain_alloc();
>>   	if (!f)
>>   		return NULL;
>>   
>> -	dma_fence_chain_init(&f->base,
>> -			     dma_fence_get(prev),
>> -			     dma_fence_get(fence),
>> +	dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
>>   			     seqno);
>>   
>> -	return &f->base.base;
>> +	return &f->base;
>>   }
>>   
>>   static int sanitycheck(void *arg)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 90136f9dedd6..325e82621467 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>>   
>>   		dep->chain = NULL;
>>   		if (syncobj_deps[i].point) {
>> -			dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
>> +			dep->chain = dma_fence_chain_alloc();
>>   			if (!dep->chain)
>>   				return -ENOMEM;
>>   		}
>> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
>>   		dep->syncobj = drm_syncobj_find(p->filp,
>>   						syncobj_deps[i].handle);
>>   		if (!dep->syncobj) {
>> -			kfree(dep->chain);
>> +			dma_fence_chain_free(dep->chain);
>>   			return -EINVAL;
>>   		}
>>   		dep->point = syncobj_deps[i].point;
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index fdd2ec87cdd1..1c5b9ef6da37 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>   				     &fence);
>>   	if (ret)
>>   		goto err;
>> -	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +	chain = dma_fence_chain_alloc();
>>   	if (!chain) {
>>   		ret = -ENOMEM;
>>   		goto err1;
>> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>   		goto err_points;
>>   	}
>>   	for (i = 0; i < args->count_handles; i++) {
>> -		chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +		chains[i] = dma_fence_chain_alloc();
>>   		if (!chains[i]) {
>>   			for (j = 0; j < i; j++)
>> -				kfree(chains[j]);
>> +				dma_fence_chain_free(chains[j]);
>>   			ret = -ENOMEM;
>>   			goto err_chains;
>>   		}
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 66789111a24b..a22cb86730b3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
>>   	while (n--) {
>>   		drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
>>   		dma_fence_put(fences[n].dma_fence);
>> -		kfree(fences[n].chain_fence);
>> +		dma_fence_chain_free(fences[n].chain_fence);
>>   	}
>>   	kvfree(fences);
>>   }
>> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
>>   				return -EINVAL;
>>   			}
>>   
>> -			f->chain_fence =
>> -				kmalloc(sizeof(*f->chain_fence),
>> -					GFP_KERNEL);
>> +			f->chain_fence = dma_fence_chain_alloc();
>>   			if (!f->chain_fence) {
>>   				drm_syncobj_put(syncobj);
>>   				dma_fence_put(fence);
>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
>> index 5480852bdeda..6ff6df6c4791 100644
>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
>> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>>   				break;
>>   			}
>>   
>> -			post_deps[i].chain =
>> -				kmalloc(sizeof(*post_deps[i].chain),
>> -				        GFP_KERNEL);
>> +			post_deps[i].chain = dma_fence_chain_alloc();
>>   			if (!post_deps[i].chain) {
>>   				ret = -ENOMEM;
>>   				break;
>> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
>>   
>>   	if (ret) {
>>   		for (j = 0; j <= i; ++j) {
>> -			kfree(post_deps[j].chain);
>> +			dma_fence_chain_free(post_deps[j].chain);
>>   			if (post_deps[j].syncobj)
>>   				drm_syncobj_put(post_deps[j].syncobj);
>>   		}
>> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
>> index 9d6a062be640..5f45689a6d2e 100644
>> --- a/include/linux/dma-fence-chain.h
>> +++ b/include/linux/dma-fence-chain.h
>> @@ -12,6 +12,7 @@
>>   
>>   #include <linux/dma-fence.h>
>>   #include <linux/irq_work.h>
>> +#include <linux/slab.h>
>>   
>>   /**
>>    * struct dma_fence_chain - fence to represent an node of a fence chain
>> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
>>   	return container_of(fence, struct dma_fence_chain, base);
>>   }
>>   
>> +/**
>> + * dma_fence_chain_alloc
>> + *
>> + * Returns a new dma_fence_chain object
> ... or NULL on failure.
>
>> + */
>> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
>> +{
>> +	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>> +};
>> +
>> +/**
>> + * dma_fence_chain_free
>> + * @chain: chain node to free
>> + *
>> + * Frees up an allocated but not used dma_fence_chain node.
>> + */
>> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
>> +{
>> +	kfree(chain);
> kfree_rcu, and I guess this means this patch here should be cc: stable.

Nope, kfree() is correct here.

This is to free up fences which never been initialized, so an RCU grace 
period isn't necessary because nobody could potentially have a reference.

Christian.

> This is kinda why I'm questioning whether this "dma_fence are protected by
> rcu" cross driver api is really a good idea. We largely get it wrong in
> the details in a _ton_ of places.
>
> With the details fixed:
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
>> +
>>   /**
>>    * dma_fence_chain_for_each - iterate over all fences in chain
>>    * @iter: current fence
>> -- 
>> 2.25.1
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
  2021-06-11 11:48       ` Christian König
@ 2021-06-11 14:40         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 14:40 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx list, dri-devel

On Fri, Jun 11, 2021 at 1:48 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 11.06.21 um 09:54 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
> >> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
> >> and some unused code in the selftest.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >>   drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
> >>   drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
> >>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
> >>   drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
> >>   include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
> >>   6 files changed, 35 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> >> index 9525f7f56119..8ce1ea59d31b 100644
> >> --- a/drivers/dma-buf/st-dma-fence-chain.c
> >> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> >> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
> >>      return &f->base;
> >>   }
> >>
> >> -static inline struct mock_chain {
> >> -    struct dma_fence_chain base;
> >> -} *to_mock_chain(struct dma_fence *f) {
> >> -    return container_of(f, struct mock_chain, base.base);
> >> -}
> >> -
> >>   static struct dma_fence *mock_chain(struct dma_fence *prev,
> >>                                  struct dma_fence *fence,
> >>                                  u64 seqno)
> >>   {
> >> -    struct mock_chain *f;
> >> +    struct dma_fence_chain *f;
> >>
> >> -    f = kmalloc(sizeof(*f), GFP_KERNEL);
> >> +    f = dma_fence_chain_alloc();
> >>      if (!f)
> >>              return NULL;
> >>
> >> -    dma_fence_chain_init(&f->base,
> >> -                         dma_fence_get(prev),
> >> -                         dma_fence_get(fence),
> >> +    dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
> >>                           seqno);
> >>
> >> -    return &f->base.base;
> >> +    return &f->base;
> >>   }
> >>
> >>   static int sanitycheck(void *arg)
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> index 90136f9dedd6..325e82621467 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
> >>
> >>              dep->chain = NULL;
> >>              if (syncobj_deps[i].point) {
> >> -                    dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
> >> +                    dep->chain = dma_fence_chain_alloc();
> >>                      if (!dep->chain)
> >>                              return -ENOMEM;
> >>              }
> >> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
> >>              dep->syncobj = drm_syncobj_find(p->filp,
> >>                                              syncobj_deps[i].handle);
> >>              if (!dep->syncobj) {
> >> -                    kfree(dep->chain);
> >> +                    dma_fence_chain_free(dep->chain);
> >>                      return -EINVAL;
> >>              }
> >>              dep->point = syncobj_deps[i].point;
> >> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> >> index fdd2ec87cdd1..1c5b9ef6da37 100644
> >> --- a/drivers/gpu/drm/drm_syncobj.c
> >> +++ b/drivers/gpu/drm/drm_syncobj.c
> >> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
> >>                                   &fence);
> >>      if (ret)
> >>              goto err;
> >> -    chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +    chain = dma_fence_chain_alloc();
> >>      if (!chain) {
> >>              ret = -ENOMEM;
> >>              goto err1;
> >> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> >>              goto err_points;
> >>      }
> >>      for (i = 0; i < args->count_handles; i++) {
> >> -            chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +            chains[i] = dma_fence_chain_alloc();
> >>              if (!chains[i]) {
> >>                      for (j = 0; j < i; j++)
> >> -                            kfree(chains[j]);
> >> +                            dma_fence_chain_free(chains[j]);
> >>                      ret = -ENOMEM;
> >>                      goto err_chains;
> >>              }
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> index 66789111a24b..a22cb86730b3 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
> >>      while (n--) {
> >>              drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> >>              dma_fence_put(fences[n].dma_fence);
> >> -            kfree(fences[n].chain_fence);
> >> +            dma_fence_chain_free(fences[n].chain_fence);
> >>      }
> >>      kvfree(fences);
> >>   }
> >> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
> >>                              return -EINVAL;
> >>                      }
> >>
> >> -                    f->chain_fence =
> >> -                            kmalloc(sizeof(*f->chain_fence),
> >> -                                    GFP_KERNEL);
> >> +                    f->chain_fence = dma_fence_chain_alloc();
> >>                      if (!f->chain_fence) {
> >>                              drm_syncobj_put(syncobj);
> >>                              dma_fence_put(fence);
> >> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> >> index 5480852bdeda..6ff6df6c4791 100644
> >> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> >> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> >> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
> >>                              break;
> >>                      }
> >>
> >> -                    post_deps[i].chain =
> >> -                            kmalloc(sizeof(*post_deps[i].chain),
> >> -                                    GFP_KERNEL);
> >> +                    post_deps[i].chain = dma_fence_chain_alloc();
> >>                      if (!post_deps[i].chain) {
> >>                              ret = -ENOMEM;
> >>                              break;
> >> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
> >>
> >>      if (ret) {
> >>              for (j = 0; j <= i; ++j) {
> >> -                    kfree(post_deps[j].chain);
> >> +                    dma_fence_chain_free(post_deps[j].chain);
> >>                      if (post_deps[j].syncobj)
> >>                              drm_syncobj_put(post_deps[j].syncobj);
> >>              }
> >> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> >> index 9d6a062be640..5f45689a6d2e 100644
> >> --- a/include/linux/dma-fence-chain.h
> >> +++ b/include/linux/dma-fence-chain.h
> >> @@ -12,6 +12,7 @@
> >>
> >>   #include <linux/dma-fence.h>
> >>   #include <linux/irq_work.h>
> >> +#include <linux/slab.h>
> >>
> >>   /**
> >>    * struct dma_fence_chain - fence to represent an node of a fence chain
> >> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
> >>      return container_of(fence, struct dma_fence_chain, base);
> >>   }
> >>
> >> +/**
> >> + * dma_fence_chain_alloc
> >> + *
> >> + * Returns a new dma_fence_chain object
> > ... or NULL on failure.
> >
> >> + */
> >> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> >> +{
> >> +    return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +};
> >> +
> >> +/**
> >> + * dma_fence_chain_free
> >> + * @chain: chain node to free
> >> + *
> >> + * Frees up an allocated but not used dma_fence_chain node.
> >> + */
> >> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
> >> +{
> >> +    kfree(chain);
> > kfree_rcu, and I guess this means this patch here should be cc: stable.
>
> Nope, kfree() is correct here.
>
> This is to free up fences which never been initialized, so an RCU grace
> period isn't necessary because nobody could potentially have a reference.

Hah I got tricked. I see you have the next revision out there already,
I'll drop some suggestions on the kerneldoc so it's clearer what this
does and why.
-Daniel
> Christian.
>
> > This is kinda why I'm questioning whether this "dma_fence are protected by
> > rcu" cross driver api is really a good idea. We largely get it wrong in
> > the details in a _ton_ of places.
> >
> > With the details fixed:
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> >> +
> >>   /**
> >>    * dma_fence_chain_for_each - iterate over all fences in chain
> >>    * @iter: current fence
> >> --
> >> 2.25.1
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free
@ 2021-06-11 14:40         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 14:40 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx list, dri-devel

On Fri, Jun 11, 2021 at 1:48 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 11.06.21 um 09:54 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:55AM +0200, Christian König wrote:
> >> Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc
> >> and some unused code in the selftest.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >>   drivers/dma-buf/st-dma-fence-chain.c          | 16 ++++----------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  4 ++--
> >>   drivers/gpu/drm/drm_syncobj.c                 |  6 ++---
> >>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  6 ++---
> >>   drivers/gpu/drm/msm/msm_gem_submit.c          |  6 ++---
> >>   include/linux/dma-fence-chain.h               | 22 +++++++++++++++++++
> >>   6 files changed, 35 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> >> index 9525f7f56119..8ce1ea59d31b 100644
> >> --- a/drivers/dma-buf/st-dma-fence-chain.c
> >> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> >> @@ -58,28 +58,20 @@ static struct dma_fence *mock_fence(void)
> >>      return &f->base;
> >>   }
> >>
> >> -static inline struct mock_chain {
> >> -    struct dma_fence_chain base;
> >> -} *to_mock_chain(struct dma_fence *f) {
> >> -    return container_of(f, struct mock_chain, base.base);
> >> -}
> >> -
> >>   static struct dma_fence *mock_chain(struct dma_fence *prev,
> >>                                  struct dma_fence *fence,
> >>                                  u64 seqno)
> >>   {
> >> -    struct mock_chain *f;
> >> +    struct dma_fence_chain *f;
> >>
> >> -    f = kmalloc(sizeof(*f), GFP_KERNEL);
> >> +    f = dma_fence_chain_alloc();
> >>      if (!f)
> >>              return NULL;
> >>
> >> -    dma_fence_chain_init(&f->base,
> >> -                         dma_fence_get(prev),
> >> -                         dma_fence_get(fence),
> >> +    dma_fence_chain_init(f, dma_fence_get(prev), dma_fence_get(fence),
> >>                           seqno);
> >>
> >> -    return &f->base.base;
> >> +    return &f->base;
> >>   }
> >>
> >>   static int sanitycheck(void *arg)
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> index 90136f9dedd6..325e82621467 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> @@ -1124,7 +1124,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
> >>
> >>              dep->chain = NULL;
> >>              if (syncobj_deps[i].point) {
> >> -                    dep->chain = kmalloc(sizeof(*dep->chain), GFP_KERNEL);
> >> +                    dep->chain = dma_fence_chain_alloc();
> >>                      if (!dep->chain)
> >>                              return -ENOMEM;
> >>              }
> >> @@ -1132,7 +1132,7 @@ static int amdgpu_cs_process_syncobj_timeline_out_dep(struct amdgpu_cs_parser *p
> >>              dep->syncobj = drm_syncobj_find(p->filp,
> >>                                              syncobj_deps[i].handle);
> >>              if (!dep->syncobj) {
> >> -                    kfree(dep->chain);
> >> +                    dma_fence_chain_free(dep->chain);
> >>                      return -EINVAL;
> >>              }
> >>              dep->point = syncobj_deps[i].point;
> >> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> >> index fdd2ec87cdd1..1c5b9ef6da37 100644
> >> --- a/drivers/gpu/drm/drm_syncobj.c
> >> +++ b/drivers/gpu/drm/drm_syncobj.c
> >> @@ -861,7 +861,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
> >>                                   &fence);
> >>      if (ret)
> >>              goto err;
> >> -    chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +    chain = dma_fence_chain_alloc();
> >>      if (!chain) {
> >>              ret = -ENOMEM;
> >>              goto err1;
> >> @@ -1402,10 +1402,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> >>              goto err_points;
> >>      }
> >>      for (i = 0; i < args->count_handles; i++) {
> >> -            chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +            chains[i] = dma_fence_chain_alloc();
> >>              if (!chains[i]) {
> >>                      for (j = 0; j < i; j++)
> >> -                            kfree(chains[j]);
> >> +                            dma_fence_chain_free(chains[j]);
> >>                      ret = -ENOMEM;
> >>                      goto err_chains;
> >>              }
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> index 66789111a24b..a22cb86730b3 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> @@ -2983,7 +2983,7 @@ __free_fence_array(struct eb_fence *fences, unsigned int n)
> >>      while (n--) {
> >>              drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2));
> >>              dma_fence_put(fences[n].dma_fence);
> >> -            kfree(fences[n].chain_fence);
> >> +            dma_fence_chain_free(fences[n].chain_fence);
> >>      }
> >>      kvfree(fences);
> >>   }
> >> @@ -3097,9 +3097,7 @@ add_timeline_fence_array(struct i915_execbuffer *eb,
> >>                              return -EINVAL;
> >>                      }
> >>
> >> -                    f->chain_fence =
> >> -                            kmalloc(sizeof(*f->chain_fence),
> >> -                                    GFP_KERNEL);
> >> +                    f->chain_fence = dma_fence_chain_alloc();
> >>                      if (!f->chain_fence) {
> >>                              drm_syncobj_put(syncobj);
> >>                              dma_fence_put(fence);
> >> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> >> index 5480852bdeda..6ff6df6c4791 100644
> >> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> >> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> >> @@ -586,9 +586,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
> >>                              break;
> >>                      }
> >>
> >> -                    post_deps[i].chain =
> >> -                            kmalloc(sizeof(*post_deps[i].chain),
> >> -                                    GFP_KERNEL);
> >> +                    post_deps[i].chain = dma_fence_chain_alloc();
> >>                      if (!post_deps[i].chain) {
> >>                              ret = -ENOMEM;
> >>                              break;
> >> @@ -605,7 +603,7 @@ static struct msm_submit_post_dep *msm_parse_post_deps(struct drm_device *dev,
> >>
> >>      if (ret) {
> >>              for (j = 0; j <= i; ++j) {
> >> -                    kfree(post_deps[j].chain);
> >> +                    dma_fence_chain_free(post_deps[j].chain);
> >>                      if (post_deps[j].syncobj)
> >>                              drm_syncobj_put(post_deps[j].syncobj);
> >>              }
> >> diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> >> index 9d6a062be640..5f45689a6d2e 100644
> >> --- a/include/linux/dma-fence-chain.h
> >> +++ b/include/linux/dma-fence-chain.h
> >> @@ -12,6 +12,7 @@
> >>
> >>   #include <linux/dma-fence.h>
> >>   #include <linux/irq_work.h>
> >> +#include <linux/slab.h>
> >>
> >>   /**
> >>    * struct dma_fence_chain - fence to represent an node of a fence chain
> >> @@ -53,6 +54,27 @@ to_dma_fence_chain(struct dma_fence *fence)
> >>      return container_of(fence, struct dma_fence_chain, base);
> >>   }
> >>
> >> +/**
> >> + * dma_fence_chain_alloc
> >> + *
> >> + * Returns a new dma_fence_chain object
> > ... or NULL on failure.
> >
> >> + */
> >> +static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> >> +{
> >> +    return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >> +};
> >> +
> >> +/**
> >> + * dma_fence_chain_free
> >> + * @chain: chain node to free
> >> + *
> >> + * Frees up an allocated but not used dma_fence_chain node.
> >> + */
> >> +static inline void dma_fence_chain_free(struct dma_fence_chain *chain)
> >> +{
> >> +    kfree(chain);
> > kfree_rcu, and I guess this means this patch here should be cc: stable.
>
> Nope, kfree() is correct here.
>
> This is to free up fences which never been initialized, so an RCU grace
> period isn't necessary because nobody could potentially have a reference.

Hah I got tricked. I see you have the next revision out there already,
I'll drop some suggestions on the kerneldoc so it's clearer what this
does and why.
-Daniel
> Christian.
>
> > This is kinda why I'm questioning whether this "dma_fence are protected by
> > rcu" cross driver api is really a good idea. We largely get it wrong in
> > the details in a _ton_ of places.
> >
> > With the details fixed:
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> >> +
> >>   /**
> >>    * dma_fence_chain_for_each - iterate over all fences in chain
> >>    * @iter: current fence
> >> --
> >> 2.25.1
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-11 10:07       ` Christian König
@ 2021-06-11 15:11         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:11 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Fri, Jun 11, 2021 at 12:07:00PM +0200, Christian König wrote:
> Am 11.06.21 um 10:58 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
> > > Add some rather sophisticated lockless garbage collection
> > > for dma_fence_chain objects.
> > > 
> > > For this keep all initialized dma_fence_chain nodes an a
> > > queue and trigger garbage collection before a new one is
> > > allocated.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Uh hand-rolled lockless list, I'm not a fan.
> > 
> > But the real question here is why? This is a global list, so it's going to
> > look great on your desktop, but gpus are for servers now and those are
> > NUMA. So just from that pov doing garbage-collection individually still
> > feels like a much better idea.
> 
> Yeah, I was pondering on that quite a bit as well. That why I added the
> multi threaded alloc/free test.
> 
> > So what's the problem your trying to solve here?
> 
> I was not sure if the chain is garbage collected enough when used for the
> dma_resv exclusive object.
> 
> But I think we could just drop this for now and just see how it goes.

Yeah I think fixing a perf/tuning problem when we're hitting it is much
better than trying to guess what it might be and writing something really
clever to anticipate that. We generally get it wrong.

Also with the explaination of what it tests added the testcase makes
sense. Just some documentation or comment would have helped.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > ---
> > >   drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
> > >   include/linux/dma-fence-chain.h   |   5 +
> > >   2 files changed, 142 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> > > index 1b4cb3e5cec9..c2f0b69eabb7 100644
> > > --- a/drivers/dma-buf/dma-fence-chain.c
> > > +++ b/drivers/dma-buf/dma-fence-chain.c
> > > @@ -9,8 +9,53 @@
> > >   #include <linux/dma-fence-chain.h>
> > > +static struct dma_fence_chain __rcu *fifo_front;
> > > +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
> > > +
> > >   static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
> > > +/**
> > > + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
> > > + * @chain: the chain node to enqueue
> > > + *
> > > + * Add the chain node to the end of the gc fifo.
> > > + */
> > > +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
> > > +{
> > > +	struct dma_fence_chain __rcu **tmp;
> > > +
> > > +	RCU_INIT_POINTER(chain->next, NULL);
> > > +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
> > > +		   &chain->next);
> > > +
> > > +	/* This is intentionally unordered since we only need the fifo for gc */
> > > +	rcu_assign_pointer(*tmp, chain);
> > > +}
> > > +
> > > +/**
> > > + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
> > > + *
> > > + * Remove the first chain node from the gc fifo while making sure that always
> > > + * keep at least one node on the fifo for lockless fifo implementation.
> > > + * Returns the dequeued chain node or NULL.
> > > + */
> > > +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
> > > +{
> > > +	struct dma_fence_chain *chain, *tmp;
> > > +
> > > +	rcu_read_lock();
> > > +	chain = rcu_dereference(fifo_front);
> > > +	/* Never dequeue the last chain node for lockless fifo */
> > > +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
> > > +		rcu_read_unlock();
> > > +		return NULL;
> > > +	}
> > > +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
> > > +		      chain, rcu_access_pointer(chain->next));
> > > +	rcu_read_unlock();
> > > +	return tmp == chain ? chain : NULL;
> > > +}
> > > +
> > >   /**
> > >    * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
> > >    * @chain: chain node to get the previous node from
> > > @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
> > >   	return prev;
> > >   }
> > > +/**
> > > + * dma_fence_chain_try_replace - try to replace the prev node
> > > + * @chain: Chain node we try to replace prev.
> > > + * @prev: the old prev node
> > > + *
> > > + * Try to replace the previous chain node when it or its containing fence is
> > > + * signaled. Returns true if we tried, false if we need to wait.
> > > + */
> > > +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
> > > +					struct dma_fence *prev)
> > > +{
> > > +	struct dma_fence *replacement, *tmp;
> > > +	struct dma_fence_chain *prev_chain;
> > > +
> > > +	prev_chain = to_dma_fence_chain(prev);
> > > +	if (prev_chain) {
> > > +		if (!dma_fence_is_signaled(prev_chain->fence))
> > > +			return false;
> > > +
> > > +		replacement = dma_fence_chain_get_prev(prev_chain);
> > > +	} else {
> > > +		if (!dma_fence_is_signaled(prev))
> > > +			return false;
> > > +
> > > +		replacement = NULL;
> > > +	}
> > > +
> > > +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
> > > +		      replacement);
> > > +	if (tmp == prev)
> > > +		dma_fence_put(tmp);
> > > +	else
> > > +		dma_fence_put(replacement);
> > > +
> > > +	return true;
> > > +}
> > > +
> > >   /**
> > >    * dma_fence_chain_walk - chain walking function
> > >    * @fence: current chain node
> > > @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
> > >    */
> > >   struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
> > >   {
> > > -	struct dma_fence_chain *chain, *prev_chain;
> > > -	struct dma_fence *prev, *replacement, *tmp;
> > > +	struct dma_fence_chain *chain;
> > > +	struct dma_fence *prev;
> > >   	chain = to_dma_fence_chain(fence);
> > >   	if (!chain) {
> > > @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
> > >   	}
> > >   	while ((prev = dma_fence_chain_get_prev(chain))) {
> > > -
> > > -		prev_chain = to_dma_fence_chain(prev);
> > > -		if (prev_chain) {
> > > -			if (!dma_fence_is_signaled(prev_chain->fence))
> > > -				break;
> > > -
> > > -			replacement = dma_fence_chain_get_prev(prev_chain);
> > > -		} else {
> > > -			if (!dma_fence_is_signaled(prev))
> > > -				break;
> > > -
> > > -			replacement = NULL;
> > > -		}
> > > -
> > > -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
> > > -			      prev, replacement);
> > > -		if (tmp == prev)
> > > -			dma_fence_put(tmp);
> > > -		else
> > > -			dma_fence_put(replacement);
> > > +		if (!dma_fence_chain_try_replace(chain, prev))
> > > +			break;
> > >   		dma_fence_put(prev);
> > >   	}
> > > @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
> > >   	dma_fence_put(prev);
> > >   	dma_fence_put(chain->fence);
> > > -	dma_fence_free(fence);
> > > +	WRITE_ONCE(chain->fence, NULL);
> > > +
> > > +	/*
> > > +	 * Don't garbage collect here to avoid recursion and potential stack
> > > +	 * overflow.
> > > +	 */
> > > +	chain = dma_fence_chain_dequeue();
> > > +	if (!chain)
> > > +		return;
> > > +
> > > +	if (kref_read(&chain->base.refcount) ||
> > > +	    READ_ONCE(chain->fence))
> > > +		dma_fence_chain_enqueue(chain);
> > > +	else
> > > +		dma_fence_free(&chain->base);
> > >   }
> > >   const struct dma_fence_ops dma_fence_chain_ops = {
> > > @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
> > >   };
> > >   EXPORT_SYMBOL(dma_fence_chain_ops);
> > > +/**
> > > + * dma_fence_chain_garbage_collect - cleanup chain nodes
> > > + *
> > > + * Do some garbage collection and try to release chain nodes.
> > > + */
> > > +void dma_fence_chain_garbage_collect(void)
> > > +{
> > > +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
> > > +
> > > +	if (!chain)
> > > +		return;
> > > +
> > > +	if (!dma_fence_get_rcu(&chain->base)) {
> > > +		/* Unused, check if it's also clean */
> > > +		if (likely(!READ_ONCE(chain->fence))) {
> > > +			dma_fence_free(&chain->base);
> > > +			return;
> > > +		}
> > > +
> > > +	} else {
> > > +		struct dma_fence *prev;
> > > +
> > > +		/* Used, do some chain walk */
> > > +		prev = dma_fence_chain_get_prev(chain);
> > > +		if (prev) {
> > > +			dma_fence_chain_try_replace(chain, prev);
> > > +			dma_fence_put(prev);
> > > +		}
> > > +		dma_fence_put(&chain->base);
> > > +	}
> > > +	dma_fence_chain_enqueue(chain);
> > > +}
> > > +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
> > > +
> > >   /**
> > >    * dma_fence_chain_init - initialize a fence chain
> > >    * @chain: the chain node to initialize
> > > @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
> > >   	dma_fence_init(&chain->base, &dma_fence_chain_ops,
> > >   		       &chain->lock, context, seqno);
> > > +	dma_fence_chain_enqueue(chain);
> > >   }
> > > +
> > >   EXPORT_SYMBOL(dma_fence_chain_init);
> > > diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> > > index 5f45689a6d2e..b412b5396434 100644
> > > --- a/include/linux/dma-fence-chain.h
> > > +++ b/include/linux/dma-fence-chain.h
> > > @@ -19,6 +19,7 @@
> > >    * @base: fence base class
> > >    * @lock: spinlock for fence handling
> > >    * @prev: previous fence of the chain
> > > + * @next: next chain node for garbage collection
> > >    * @prev_seqno: original previous seqno before garbage collection
> > >    * @fence: encapsulated fence
> > >    * @cb: callback structure for signaling
> > > @@ -27,6 +28,7 @@
> > >   struct dma_fence_chain {
> > >   	struct dma_fence base;
> > >   	struct dma_fence __rcu *prev;
> > > +	struct dma_fence_chain __rcu *next;
> > >   	u64 prev_seqno;
> > >   	struct dma_fence *fence;
> > >   	union {
> > > @@ -38,6 +40,8 @@ struct dma_fence_chain {
> > >   extern const struct dma_fence_ops dma_fence_chain_ops;
> > > +void dma_fence_chain_garbage_collect(void);
> > > +
> > >   /**
> > >    * to_dma_fence_chain - cast a fence to a dma_fence_chain
> > >    * @fence: fence to cast to a dma_fence_array
> > > @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
> > >    */
> > >   static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> > >   {
> > > +	dma_fence_chain_garbage_collect();
> > >   	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> > >   };
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-11 15:11         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:11 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Daniel Vetter

On Fri, Jun 11, 2021 at 12:07:00PM +0200, Christian König wrote:
> Am 11.06.21 um 10:58 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:57AM +0200, Christian König wrote:
> > > Add some rather sophisticated lockless garbage collection
> > > for dma_fence_chain objects.
> > > 
> > > For this keep all initialized dma_fence_chain nodes an a
> > > queue and trigger garbage collection before a new one is
> > > allocated.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Uh hand-rolled lockless list, I'm not a fan.
> > 
> > But the real question here is why? This is a global list, so it's going to
> > look great on your desktop, but gpus are for servers now and those are
> > NUMA. So just from that pov doing garbage-collection individually still
> > feels like a much better idea.
> 
> Yeah, I was pondering on that quite a bit as well. That why I added the
> multi threaded alloc/free test.
> 
> > So what's the problem your trying to solve here?
> 
> I was not sure if the chain is garbage collected enough when used for the
> dma_resv exclusive object.
> 
> But I think we could just drop this for now and just see how it goes.

Yeah I think fixing a perf/tuning problem when we're hitting it is much
better than trying to guess what it might be and writing something really
clever to anticipate that. We generally get it wrong.

Also with the explaination of what it tests added the testcase makes
sense. Just some documentation or comment would have helped.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > ---
> > >   drivers/dma-buf/dma-fence-chain.c | 160 +++++++++++++++++++++++++-----
> > >   include/linux/dma-fence-chain.h   |   5 +
> > >   2 files changed, 142 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> > > index 1b4cb3e5cec9..c2f0b69eabb7 100644
> > > --- a/drivers/dma-buf/dma-fence-chain.c
> > > +++ b/drivers/dma-buf/dma-fence-chain.c
> > > @@ -9,8 +9,53 @@
> > >   #include <linux/dma-fence-chain.h>
> > > +static struct dma_fence_chain __rcu *fifo_front;
> > > +static struct dma_fence_chain __rcu **fifo_back = &fifo_front;
> > > +
> > >   static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
> > > +/**
> > > + * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
> > > + * @chain: the chain node to enqueue
> > > + *
> > > + * Add the chain node to the end of the gc fifo.
> > > + */
> > > +static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
> > > +{
> > > +	struct dma_fence_chain __rcu **tmp;
> > > +
> > > +	RCU_INIT_POINTER(chain->next, NULL);
> > > +	tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
> > > +		   &chain->next);
> > > +
> > > +	/* This is intentionally unordered since we only need the fifo for gc */
> > > +	rcu_assign_pointer(*tmp, chain);
> > > +}
> > > +
> > > +/**
> > > + * dma_fence_chain_dequeue - deqeueue a chain node for garbage collection
> > > + *
> > > + * Remove the first chain node from the gc fifo while making sure that always
> > > + * keep at least one node on the fifo for lockless fifo implementation.
> > > + * Returns the dequeued chain node or NULL.
> > > + */
> > > +static struct dma_fence_chain *dma_fence_chain_dequeue(void)
> > > +{
> > > +	struct dma_fence_chain *chain, *tmp;
> > > +
> > > +	rcu_read_lock();
> > > +	chain = rcu_dereference(fifo_front);
> > > +	/* Never dequeue the last chain node for lockless fifo */
> > > +	if (unlikely(!chain || !rcu_access_pointer(chain->next))) {
> > > +		rcu_read_unlock();
> > > +		return NULL;
> > > +	}
> > > +	tmp = cmpxchg((struct dma_fence_chain __force **)&fifo_front,
> > > +		      chain, rcu_access_pointer(chain->next));
> > > +	rcu_read_unlock();
> > > +	return tmp == chain ? chain : NULL;
> > > +}
> > > +
> > >   /**
> > >    * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
> > >    * @chain: chain node to get the previous node from
> > > @@ -28,6 +73,43 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
> > >   	return prev;
> > >   }
> > > +/**
> > > + * dma_fence_chain_try_replace - try to replace the prev node
> > > + * @chain: Chain node we try to replace prev.
> > > + * @prev: the old prev node
> > > + *
> > > + * Try to replace the previous chain node when it or its containing fence is
> > > + * signaled. Returns true if we tried, false if we need to wait.
> > > + */
> > > +static bool dma_fence_chain_try_replace(struct dma_fence_chain *chain,
> > > +					struct dma_fence *prev)
> > > +{
> > > +	struct dma_fence *replacement, *tmp;
> > > +	struct dma_fence_chain *prev_chain;
> > > +
> > > +	prev_chain = to_dma_fence_chain(prev);
> > > +	if (prev_chain) {
> > > +		if (!dma_fence_is_signaled(prev_chain->fence))
> > > +			return false;
> > > +
> > > +		replacement = dma_fence_chain_get_prev(prev_chain);
> > > +	} else {
> > > +		if (!dma_fence_is_signaled(prev))
> > > +			return false;
> > > +
> > > +		replacement = NULL;
> > > +	}
> > > +
> > > +	tmp = cmpxchg((struct dma_fence __force **)&chain->prev, prev,
> > > +		      replacement);
> > > +	if (tmp == prev)
> > > +		dma_fence_put(tmp);
> > > +	else
> > > +		dma_fence_put(replacement);
> > > +
> > > +	return true;
> > > +}
> > > +
> > >   /**
> > >    * dma_fence_chain_walk - chain walking function
> > >    * @fence: current chain node
> > > @@ -38,8 +120,8 @@ static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
> > >    */
> > >   struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
> > >   {
> > > -	struct dma_fence_chain *chain, *prev_chain;
> > > -	struct dma_fence *prev, *replacement, *tmp;
> > > +	struct dma_fence_chain *chain;
> > > +	struct dma_fence *prev;
> > >   	chain = to_dma_fence_chain(fence);
> > >   	if (!chain) {
> > > @@ -48,26 +130,8 @@ struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
> > >   	}
> > >   	while ((prev = dma_fence_chain_get_prev(chain))) {
> > > -
> > > -		prev_chain = to_dma_fence_chain(prev);
> > > -		if (prev_chain) {
> > > -			if (!dma_fence_is_signaled(prev_chain->fence))
> > > -				break;
> > > -
> > > -			replacement = dma_fence_chain_get_prev(prev_chain);
> > > -		} else {
> > > -			if (!dma_fence_is_signaled(prev))
> > > -				break;
> > > -
> > > -			replacement = NULL;
> > > -		}
> > > -
> > > -		tmp = cmpxchg((struct dma_fence __force **)&chain->prev,
> > > -			      prev, replacement);
> > > -		if (tmp == prev)
> > > -			dma_fence_put(tmp);
> > > -		else
> > > -			dma_fence_put(replacement);
> > > +		if (!dma_fence_chain_try_replace(chain, prev))
> > > +			break;
> > >   		dma_fence_put(prev);
> > >   	}
> > > @@ -205,7 +269,21 @@ static void dma_fence_chain_release(struct dma_fence *fence)
> > >   	dma_fence_put(prev);
> > >   	dma_fence_put(chain->fence);
> > > -	dma_fence_free(fence);
> > > +	WRITE_ONCE(chain->fence, NULL);
> > > +
> > > +	/*
> > > +	 * Don't garbage collect here to avoid recursion and potential stack
> > > +	 * overflow.
> > > +	 */
> > > +	chain = dma_fence_chain_dequeue();
> > > +	if (!chain)
> > > +		return;
> > > +
> > > +	if (kref_read(&chain->base.refcount) ||
> > > +	    READ_ONCE(chain->fence))
> > > +		dma_fence_chain_enqueue(chain);
> > > +	else
> > > +		dma_fence_free(&chain->base);
> > >   }
> > >   const struct dma_fence_ops dma_fence_chain_ops = {
> > > @@ -218,6 +296,40 @@ const struct dma_fence_ops dma_fence_chain_ops = {
> > >   };
> > >   EXPORT_SYMBOL(dma_fence_chain_ops);
> > > +/**
> > > + * dma_fence_chain_garbage_collect - cleanup chain nodes
> > > + *
> > > + * Do some garbage collection and try to release chain nodes.
> > > + */
> > > +void dma_fence_chain_garbage_collect(void)
> > > +{
> > > +	struct dma_fence_chain *chain = dma_fence_chain_dequeue();
> > > +
> > > +	if (!chain)
> > > +		return;
> > > +
> > > +	if (!dma_fence_get_rcu(&chain->base)) {
> > > +		/* Unused, check if it's also clean */
> > > +		if (likely(!READ_ONCE(chain->fence))) {
> > > +			dma_fence_free(&chain->base);
> > > +			return;
> > > +		}
> > > +
> > > +	} else {
> > > +		struct dma_fence *prev;
> > > +
> > > +		/* Used, do some chain walk */
> > > +		prev = dma_fence_chain_get_prev(chain);
> > > +		if (prev) {
> > > +			dma_fence_chain_try_replace(chain, prev);
> > > +			dma_fence_put(prev);
> > > +		}
> > > +		dma_fence_put(&chain->base);
> > > +	}
> > > +	dma_fence_chain_enqueue(chain);
> > > +}
> > > +EXPORT_SYMBOL(dma_fence_chain_garbage_collect);
> > > +
> > >   /**
> > >    * dma_fence_chain_init - initialize a fence chain
> > >    * @chain: the chain node to initialize
> > > @@ -254,5 +366,7 @@ void dma_fence_chain_init(struct dma_fence_chain *chain,
> > >   	dma_fence_init(&chain->base, &dma_fence_chain_ops,
> > >   		       &chain->lock, context, seqno);
> > > +	dma_fence_chain_enqueue(chain);
> > >   }
> > > +
> > >   EXPORT_SYMBOL(dma_fence_chain_init);
> > > diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
> > > index 5f45689a6d2e..b412b5396434 100644
> > > --- a/include/linux/dma-fence-chain.h
> > > +++ b/include/linux/dma-fence-chain.h
> > > @@ -19,6 +19,7 @@
> > >    * @base: fence base class
> > >    * @lock: spinlock for fence handling
> > >    * @prev: previous fence of the chain
> > > + * @next: next chain node for garbage collection
> > >    * @prev_seqno: original previous seqno before garbage collection
> > >    * @fence: encapsulated fence
> > >    * @cb: callback structure for signaling
> > > @@ -27,6 +28,7 @@
> > >   struct dma_fence_chain {
> > >   	struct dma_fence base;
> > >   	struct dma_fence __rcu *prev;
> > > +	struct dma_fence_chain __rcu *next;
> > >   	u64 prev_seqno;
> > >   	struct dma_fence *fence;
> > >   	union {
> > > @@ -38,6 +40,8 @@ struct dma_fence_chain {
> > >   extern const struct dma_fence_ops dma_fence_chain_ops;
> > > +void dma_fence_chain_garbage_collect(void);
> > > +
> > >   /**
> > >    * to_dma_fence_chain - cast a fence to a dma_fence_chain
> > >    * @fence: fence to cast to a dma_fence_array
> > > @@ -61,6 +65,7 @@ to_dma_fence_chain(struct dma_fence *fence)
> > >    */
> > >   static inline struct dma_fence_chain *dma_fence_chain_alloc(void)
> > >   {
> > > +	dma_fence_chain_garbage_collect();
> > >   	return kmalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> > >   };
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-11 10:09       ` Christian König
@ 2021-06-11 15:18         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:18 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > sync to the first fence not matching the owner rules.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
> > >   1 file changed, 68 insertions(+), 50 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -28,6 +28,8 @@
> > >    *    Christian König <christian.koenig@amd.com>
> > >    */
> > > +#include <linux/dma-fence-chain.h>
> > > +
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_trace.h"
> > >   #include "amdgpu_amdkfd.h"
> > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
> > >   	return amdgpu_sync_fence(sync, fence);
> > >   }
> > > +/* Determine based on the owner and mode if we should sync to a fence or not */
> > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > +				   enum amdgpu_sync_mode mode,
> > > +				   void *owner, struct dma_fence *f)
> > > +{
> > > +	void *fence_owner = amdgpu_sync_get_owner(f);
> > > +
> > > +	/* Always sync to moves, no matter what */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return true;
> > > +
> > > +	/* We only want to trigger KFD eviction fences on
> > > +	 * evict or move jobs. Skip KFD fences otherwise.
> > > +	 */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return false;
> > > +
> > > +	/* Never sync to VM updates either. */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return false;
> > > +
> > > +	/* Ignore fences depending on the sync mode */
> > > +	switch (mode) {
> > > +	case AMDGPU_SYNC_ALWAYS:
> > > +		return true;
> > > +
> > > +	case AMDGPU_SYNC_NE_OWNER:
> > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > +		    fence_owner == owner)
> > > +			return false;
> > > +		break;
> > > +
> > > +	case AMDGPU_SYNC_EQ_OWNER:
> > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > +		    fence_owner != owner)
> > > +			return false;
> > > +		break;
> > > +
> > > +	case AMDGPU_SYNC_EXPLICIT:
> > > +		return false;
> > > +	}
> > > +
> > > +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > +	     "Adding eviction fence to sync obj");
> > > +	return true;
> > > +}
> > > +
> > >   /**
> > >    * amdgpu_sync_resv - sync to a reservation object
> > >    *
> > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
> > >   	/* always sync to the exclusive fence */
> > >   	f = dma_resv_excl_fence(resv);
> > > -	r = amdgpu_sync_fence(sync, f);
> > > +	dma_fence_chain_for_each(f, f) {
> > Jason has some helper for deep-walking fence chains/arrays here I think.
> > Might want to look into that, so that we have some consistency in how we
> > pile up multiple exclusive fences.
> 
> Well those helpers are not from Jason, but from me :)
> 
> But no, for now the deep inspection is not really helpful here since
> grabbing a reference to a certain chain node is what that makes the handling
> easier and faster here.
> 
> Thinking more about it that should also make it possible for the garbage
> collection to kick in properly.

Hm this is tricky to reason about, but yeah with this here it's a true
chain, and you just need to connect them. But then if a buffer is on
multiple engines, collapsing things down occasionally might be useful.

But maybe we need to do that in the bigger rework where exclusive fences
are also just in the dma_fence_list with a "this is an exclusive one btw"
tag.

I think for the vk import case doing the deep scan makes more sense, it's
a once-per-frame thing, and there's a much bigger chance that you have a
pile of fences from different engines on it already.

I think a comment explaining why we think deep scan isn't a good idea here
would be good, just so we can appreciate our foolishness when it all goes
wrong :-)
-Daniel


> > 
> > Anyway pretty much one of the versions I had in mind too, except I didn't
> > type it up.
> > 
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Thanks,
> Christian.
> 
> > 
> > > +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > +
> > > +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > +					   chain->fence : f)) {
> > > +			r = amdgpu_sync_fence(sync, f);
> > > +			dma_fence_put(f);
> > > +			if (r)
> > > +				return r;
> > > +			break;
> > > +		}
> > > +	}
> > >   	flist = dma_resv_shared_list(resv);
> > > -	if (!flist || r)
> > > -		return r;
> > > +	if (!flist)
> > > +		return 0;
> > >   	for (i = 0; i < flist->shared_count; ++i) {
> > > -		void *fence_owner;
> > > -
> > >   		f = rcu_dereference_protected(flist->shared[i],
> > >   					      dma_resv_held(resv));
> > > -		fence_owner = amdgpu_sync_get_owner(f);
> > > -
> > > -		/* Always sync to moves, no matter what */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > >   			r = amdgpu_sync_fence(sync, f);
> > >   			if (r)
> > > -				break;
> > > -		}
> > > -
> > > -		/* We only want to trigger KFD eviction fences on
> > > -		 * evict or move jobs. Skip KFD fences otherwise.
> > > -		 */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > -			continue;
> > > -
> > > -		/* Never sync to VM updates either. */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > -			continue;
> > > -
> > > -		/* Ignore fences depending on the sync mode */
> > > -		switch (mode) {
> > > -		case AMDGPU_SYNC_ALWAYS:
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_NE_OWNER:
> > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > -			    fence_owner == owner)
> > > -				continue;
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_EQ_OWNER:
> > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > -			    fence_owner != owner)
> > > -				continue;
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_EXPLICIT:
> > > -			continue;
> > > +				return r;
> > >   		}
> > > -
> > > -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > -		     "Adding eviction fence to sync obj");
> > > -		r = amdgpu_sync_fence(sync, f);
> > > -		if (r)
> > > -			break;
> > >   	}
> > > -	return r;
> > > +	return 0;
> > >   }
> > >   /**
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-11 15:18         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:18 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Daniel Vetter

On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > sync to the first fence not matching the owner rules.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
> > >   1 file changed, 68 insertions(+), 50 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -28,6 +28,8 @@
> > >    *    Christian König <christian.koenig@amd.com>
> > >    */
> > > +#include <linux/dma-fence-chain.h>
> > > +
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_trace.h"
> > >   #include "amdgpu_amdkfd.h"
> > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
> > >   	return amdgpu_sync_fence(sync, fence);
> > >   }
> > > +/* Determine based on the owner and mode if we should sync to a fence or not */
> > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > +				   enum amdgpu_sync_mode mode,
> > > +				   void *owner, struct dma_fence *f)
> > > +{
> > > +	void *fence_owner = amdgpu_sync_get_owner(f);
> > > +
> > > +	/* Always sync to moves, no matter what */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return true;
> > > +
> > > +	/* We only want to trigger KFD eviction fences on
> > > +	 * evict or move jobs. Skip KFD fences otherwise.
> > > +	 */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return false;
> > > +
> > > +	/* Never sync to VM updates either. */
> > > +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > +		return false;
> > > +
> > > +	/* Ignore fences depending on the sync mode */
> > > +	switch (mode) {
> > > +	case AMDGPU_SYNC_ALWAYS:
> > > +		return true;
> > > +
> > > +	case AMDGPU_SYNC_NE_OWNER:
> > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > +		    fence_owner == owner)
> > > +			return false;
> > > +		break;
> > > +
> > > +	case AMDGPU_SYNC_EQ_OWNER:
> > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > +		    fence_owner != owner)
> > > +			return false;
> > > +		break;
> > > +
> > > +	case AMDGPU_SYNC_EXPLICIT:
> > > +		return false;
> > > +	}
> > > +
> > > +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > +	     "Adding eviction fence to sync obj");
> > > +	return true;
> > > +}
> > > +
> > >   /**
> > >    * amdgpu_sync_resv - sync to a reservation object
> > >    *
> > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
> > >   	/* always sync to the exclusive fence */
> > >   	f = dma_resv_excl_fence(resv);
> > > -	r = amdgpu_sync_fence(sync, f);
> > > +	dma_fence_chain_for_each(f, f) {
> > Jason has some helper for deep-walking fence chains/arrays here I think.
> > Might want to look into that, so that we have some consistency in how we
> > pile up multiple exclusive fences.
> 
> Well those helpers are not from Jason, but from me :)
> 
> But no, for now the deep inspection is not really helpful here since
> grabbing a reference to a certain chain node is what that makes the handling
> easier and faster here.
> 
> Thinking more about it that should also make it possible for the garbage
> collection to kick in properly.

Hm this is tricky to reason about, but yeah with this here it's a true
chain, and you just need to connect them. But then if a buffer is on
multiple engines, collapsing things down occasionally might be useful.

But maybe we need to do that in the bigger rework where exclusive fences
are also just in the dma_fence_list with a "this is an exclusive one btw"
tag.

I think for the vk import case doing the deep scan makes more sense, it's
a once-per-frame thing, and there's a much bigger chance that you have a
pile of fences from different engines on it already.

I think a comment explaining why we think deep scan isn't a good idea here
would be good, just so we can appreciate our foolishness when it all goes
wrong :-)
-Daniel


> > 
> > Anyway pretty much one of the versions I had in mind too, except I didn't
> > type it up.
> > 
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Thanks,
> Christian.
> 
> > 
> > > +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > +
> > > +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > +					   chain->fence : f)) {
> > > +			r = amdgpu_sync_fence(sync, f);
> > > +			dma_fence_put(f);
> > > +			if (r)
> > > +				return r;
> > > +			break;
> > > +		}
> > > +	}
> > >   	flist = dma_resv_shared_list(resv);
> > > -	if (!flist || r)
> > > -		return r;
> > > +	if (!flist)
> > > +		return 0;
> > >   	for (i = 0; i < flist->shared_count; ++i) {
> > > -		void *fence_owner;
> > > -
> > >   		f = rcu_dereference_protected(flist->shared[i],
> > >   					      dma_resv_held(resv));
> > > -		fence_owner = amdgpu_sync_get_owner(f);
> > > -
> > > -		/* Always sync to moves, no matter what */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > >   			r = amdgpu_sync_fence(sync, f);
> > >   			if (r)
> > > -				break;
> > > -		}
> > > -
> > > -		/* We only want to trigger KFD eviction fences on
> > > -		 * evict or move jobs. Skip KFD fences otherwise.
> > > -		 */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > -			continue;
> > > -
> > > -		/* Never sync to VM updates either. */
> > > -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > -			continue;
> > > -
> > > -		/* Ignore fences depending on the sync mode */
> > > -		switch (mode) {
> > > -		case AMDGPU_SYNC_ALWAYS:
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_NE_OWNER:
> > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > -			    fence_owner == owner)
> > > -				continue;
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_EQ_OWNER:
> > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > -			    fence_owner != owner)
> > > -				continue;
> > > -			break;
> > > -
> > > -		case AMDGPU_SYNC_EXPLICIT:
> > > -			continue;
> > > +				return r;
> > >   		}
> > > -
> > > -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > -		     "Adding eviction fence to sync obj");
> > > -		r = amdgpu_sync_fence(sync, f);
> > > -		if (r)
> > > -			break;
> > >   	}
> > > -	return r;
> > > +	return 0;
> > >   }
> > >   /**
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
  2021-06-11 10:12       ` Christian König
@ 2021-06-11 15:21         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:21 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Fri, Jun 11, 2021 at 12:12:45PM +0200, Christian König wrote:
> Am 11.06.21 um 11:17 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
> > > Drop the workaround and instead implement a better solution.
> > > 
> > > Basically we are now chaining all submissions using a dma_fence_chain
> > > container and adding them as exclusive fence to the dma_resv object.
> > > 
> > > This way other drivers can still sync to the single exclusive fence
> > > while amdgpu only sync to fences from different processes.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
> > >   6 files changed, 47 insertions(+), 79 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > index a130e766cbdb..c905a4cfc173 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
> > >   struct amdgpu_bo_list_entry {
> > >   	struct ttm_validate_buffer	tv;
> > >   	struct amdgpu_bo_va		*bo_va;
> > > +	struct dma_fence_chain		*chain;
> > >   	uint32_t			priority;
> > >   	struct page			**user_pages;
> > >   	bool				user_invalidated;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > index 325e82621467..f6f3029f958d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   		goto out;
> > >   	}
> > > +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > +
> > > +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > +
> > > +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> > > +			e->chain = dma_fence_chain_alloc();
> > > +			if (!e->chain) {
> > > +				r = -ENOMEM;
> > > +				goto error_validate;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > >   	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
> > >   					  &p->bytes_moved_vis_threshold);
> > >   	p->bytes_moved = 0;
> > > @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   	gws = p->bo_list->gws_obj;
> > >   	oa = p->bo_list->oa_obj;
> > > -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > -
> > > -		/* Make sure we use the exclusive slot for shared BOs */
> > > -		if (bo->prime_shared_count)
> > > -			e->tv.num_shared = 0;
> > > -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > -	}
> > > -
> > >   	if (gds) {
> > >   		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> > >   		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> > > @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   	}
> > >   error_validate:
> > > -	if (r)
> > > +	if (r) {
> > > +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +			dma_fence_chain_free(e->chain);
> > > +			e->chain = NULL;
> > > +		}
> > >   		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
> > > +	}
> > >   out:
> > >   	return r;
> > >   }
> > > @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
> > >   {
> > >   	unsigned i;
> > > -	if (error && backoff)
> > > +	if (error && backoff) {
> > > +		struct amdgpu_bo_list_entry *e;
> > > +
> > > +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> > > +			dma_fence_chain_free(e->chain);
> > > +			e->chain = NULL;
> > > +		}
> > > +
> > >   		ttm_eu_backoff_reservation(&parser->ticket,
> > >   					   &parser->validated);
> > > +	}
> > >   	for (i = 0; i < parser->num_post_deps; i++) {
> > >   		drm_syncobj_put(parser->post_deps[i].syncobj);
> > > @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > >   	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
> > > +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +		struct dma_resv *resv = e->tv.bo->base.resv;
> > > +		struct dma_fence_chain *chain = e->chain;
> > > +
> > > +		if (!chain)
> > > +			continue;
> > > +
> > > +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> > > +				     dma_fence_get(p->fence), 1);
> > > +
> > > +		rcu_assign_pointer(resv->fence_excl, &chain->base);
> > So for safety since this is now driver interface I was thinking off a
> > helper which does this entire dance and _also_ adds the new fence to the
> > shared slots. This way we don't let drivers open-code this rather tricky
> > operation.
> 
> Well I only see this as a workaround for amdgpu and don't want to leak it
> into other drivers.
> 
> If somebody else wants to adopt it we should probably consider fixing the
> dma_resv object instead.

Well semantically this is exactly what what the sync_file import from
Jason does. And it's kinda also what a lot of the drivers with opt-out
implicit sync should be doing.

So I really think we do need this as a dma_resv primitive in form of a
function drivers can use. Once we have that rolled out and tested some,
then we can fix up the internals of dma_resv to make this case more
efficient.

> > You probably also want to then combine this with
> > ttm_eu_fence_buffer_objects below I think so you're not walking this list
> > 2x.
> > 
> > Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
> > then let ttm_eu_fence_buffer_objects() do this, by calling a new
> > dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
> > sync_file import will use. We want to hide the inner workings of dma_resv
> > as much as possible ofc.
> > 
> > The other thing I'm wondering is whether this needs to be wrapped in a
> > single seqlock or whether we don't have ordering issues if we split up the
> > update here? Setting the exclusive fence before we also added it to the
> > shared slot can at least violate the "shared fences signal after exclusive
> > if both are present"
> 
> Uff, good point.

Plus it's actually fairly tricky thing :-)
-Daniel

> 
> > Finally I guess need to sprinkle the manual garbage collector somehwere
> > here too.
> 
> That is done automatically when somebody iterates the chain node.
> 
> Christian.
> 
> > 
> > But aside from the interface polish and correctness against races I think
> > this looks solid in the big picture.
> > 
> > Cheers, Daniel
> > 
> > > +		e->chain = NULL;
> > > +	}
> > > +
> > >   	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
> > >   	mutex_unlock(&p->adev->notifier_lock);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > index c3053b83b80c..23219fc3b28c 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > @@ -42,48 +42,6 @@
> > >   #include <linux/pci-p2pdma.h>
> > >   #include <linux/pm_runtime.h>
> > > -static int
> > > -__dma_resv_make_exclusive(struct dma_resv *obj)
> > > -{
> > > -	struct dma_fence **fences;
> > > -	unsigned int count;
> > > -	int r;
> > > -
> > > -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
> > > -		return 0;
> > > -
> > > -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
> > > -	if (r)
> > > -		return r;
> > > -
> > > -	if (count == 0) {
> > > -		/* Now that was unexpected. */
> > > -	} else if (count == 1) {
> > > -		dma_resv_add_excl_fence(obj, fences[0]);
> > > -		dma_fence_put(fences[0]);
> > > -		kfree(fences);
> > > -	} else {
> > > -		struct dma_fence_array *array;
> > > -
> > > -		array = dma_fence_array_create(count, fences,
> > > -					       dma_fence_context_alloc(1), 0,
> > > -					       false);
> > > -		if (!array)
> > > -			goto err_fences_put;
> > > -
> > > -		dma_resv_add_excl_fence(obj, &array->base);
> > > -		dma_fence_put(&array->base);
> > > -	}
> > > -
> > > -	return 0;
> > > -
> > > -err_fences_put:
> > > -	while (count--)
> > > -		dma_fence_put(fences[count]);
> > > -	kfree(fences);
> > > -	return -ENOMEM;
> > > -}
> > > -
> > >   /**
> > >    * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
> > >    *
> > > @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
> > >   	if (r < 0)
> > >   		goto out;
> > > -	r = amdgpu_bo_reserve(bo, false);
> > > -	if (unlikely(r != 0))
> > > -		goto out;
> > > -
> > > -	/*
> > > -	 * We only create shared fences for internal use, but importers
> > > -	 * of the dmabuf rely on exclusive fences for implicitly
> > > -	 * tracking write hazards. As any of the current fences may
> > > -	 * correspond to a write, we need to convert all existing
> > > -	 * fences on the reservation object into a single exclusive
> > > -	 * fence.
> > > -	 */
> > > -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
> > > -	if (r)
> > > -		goto out;
> > > -
> > > -	bo->prime_shared_count++;
> > > -	amdgpu_bo_unreserve(bo);
> > >   	return 0;
> > >   out:
> > > @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
> > >   	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> > >   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > > -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
> > > -		bo->prime_shared_count--;
> > > -
> > >   	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> > >   	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > >   }
> > > @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
> > >   	bo = gem_to_amdgpu_bo(gobj);
> > >   	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
> > >   	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> > > -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
> > > -		bo->prime_shared_count = 1;
> > >   	dma_resv_unlock(resv);
> > >   	return gobj;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > index 1c3e3b608332..5d82120fc160 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
> > >   		break;
> > >   	}
> > >   	case AMDGPU_GEM_OP_SET_PLACEMENT:
> > > -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
> > > +		if (robj->tbo.base.import_attach &&
> > > +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
> > >   			r = -EINVAL;
> > >   			amdgpu_bo_unreserve(robj);
> > >   			break;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index 96447e1d4c9c..30951b593809 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
> > >   		return -EINVAL;
> > >   	/* A shared bo cannot be migrated to VRAM */
> > > -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
> > > +	if (bo->tbo.base.import_attach) {
> > >   		if (domain & AMDGPU_GEM_DOMAIN_GTT)
> > >   			domain = AMDGPU_GEM_DOMAIN_GTT;
> > >   		else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index b35962702278..55d7e90eaa72 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -98,7 +98,6 @@ struct amdgpu_bo {
> > >   	struct ttm_buffer_object	tbo;
> > >   	struct ttm_bo_kmap_obj		kmap;
> > >   	u64				flags;
> > > -	unsigned			prime_shared_count;
> > >   	/* per VM structure for page tables and with virtual addresses */
> > >   	struct amdgpu_vm_bo_base	*vm_bo;
> > >   	/* Constant after initialization */
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
@ 2021-06-11 15:21         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-11 15:21 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Daniel Vetter

On Fri, Jun 11, 2021 at 12:12:45PM +0200, Christian König wrote:
> Am 11.06.21 um 11:17 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
> > > Drop the workaround and instead implement a better solution.
> > > 
> > > Basically we are now chaining all submissions using a dma_fence_chain
> > > container and adding them as exclusive fence to the dma_resv object.
> > > 
> > > This way other drivers can still sync to the single exclusive fence
> > > while amdgpu only sync to fences from different processes.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 54 +++++++++++++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  3 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
> > >   6 files changed, 47 insertions(+), 79 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > index a130e766cbdb..c905a4cfc173 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
> > >   struct amdgpu_bo_list_entry {
> > >   	struct ttm_validate_buffer	tv;
> > >   	struct amdgpu_bo_va		*bo_va;
> > > +	struct dma_fence_chain		*chain;
> > >   	uint32_t			priority;
> > >   	struct page			**user_pages;
> > >   	bool				user_invalidated;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > index 325e82621467..f6f3029f958d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   		goto out;
> > >   	}
> > > +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > +
> > > +		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > +
> > > +		if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> > > +			e->chain = dma_fence_chain_alloc();
> > > +			if (!e->chain) {
> > > +				r = -ENOMEM;
> > > +				goto error_validate;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > >   	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
> > >   					  &p->bytes_moved_vis_threshold);
> > >   	p->bytes_moved = 0;
> > > @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   	gws = p->bo_list->gws_obj;
> > >   	oa = p->bo_list->oa_obj;
> > > -	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > -		struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > -
> > > -		/* Make sure we use the exclusive slot for shared BOs */
> > > -		if (bo->prime_shared_count)
> > > -			e->tv.num_shared = 0;
> > > -		e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > -	}
> > > -
> > >   	if (gds) {
> > >   		p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> > >   		p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> > > @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > >   	}
> > >   error_validate:
> > > -	if (r)
> > > +	if (r) {
> > > +		amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +			dma_fence_chain_free(e->chain);
> > > +			e->chain = NULL;
> > > +		}
> > >   		ttm_eu_backoff_reservation(&p->ticket, &p->validated);
> > > +	}
> > >   out:
> > >   	return r;
> > >   }
> > > @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
> > >   {
> > >   	unsigned i;
> > > -	if (error && backoff)
> > > +	if (error && backoff) {
> > > +		struct amdgpu_bo_list_entry *e;
> > > +
> > > +		amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> > > +			dma_fence_chain_free(e->chain);
> > > +			e->chain = NULL;
> > > +		}
> > > +
> > >   		ttm_eu_backoff_reservation(&parser->ticket,
> > >   					   &parser->validated);
> > > +	}
> > >   	for (i = 0; i < parser->num_post_deps; i++) {
> > >   		drm_syncobj_put(parser->post_deps[i].syncobj);
> > > @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > >   	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
> > > +	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > +		struct dma_resv *resv = e->tv.bo->base.resv;
> > > +		struct dma_fence_chain *chain = e->chain;
> > > +
> > > +		if (!chain)
> > > +			continue;
> > > +
> > > +		dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> > > +				     dma_fence_get(p->fence), 1);
> > > +
> > > +		rcu_assign_pointer(resv->fence_excl, &chain->base);
> > So for safety since this is now driver interface I was thinking off a
> > helper which does this entire dance and _also_ adds the new fence to the
> > shared slots. This way we don't let drivers open-code this rather tricky
> > operation.
> 
> Well I only see this as a workaround for amdgpu and don't want to leak it
> into other drivers.
> 
> If somebody else wants to adopt it we should probably consider fixing the
> dma_resv object instead.

Well semantically this is exactly what what the sync_file import from
Jason does. And it's kinda also what a lot of the drivers with opt-out
implicit sync should be doing.

So I really think we do need this as a dma_resv primitive in form of a
function drivers can use. Once we have that rolled out and tested some,
then we can fix up the internals of dma_resv to make this case more
efficient.

> > You probably also want to then combine this with
> > ttm_eu_fence_buffer_objects below I think so you're not walking this list
> > 2x.
> > 
> > Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
> > then let ttm_eu_fence_buffer_objects() do this, by calling a new
> > dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
> > sync_file import will use. We want to hide the inner workings of dma_resv
> > as much as possible ofc.
> > 
> > The other thing I'm wondering is whether this needs to be wrapped in a
> > single seqlock or whether we don't have ordering issues if we split up the
> > update here? Setting the exclusive fence before we also added it to the
> > shared slot can at least violate the "shared fences signal after exclusive
> > if both are present"
> 
> Uff, good point.

Plus it's actually fairly tricky thing :-)
-Daniel

> 
> > Finally I guess need to sprinkle the manual garbage collector somehwere
> > here too.
> 
> That is done automatically when somebody iterates the chain node.
> 
> Christian.
> 
> > 
> > But aside from the interface polish and correctness against races I think
> > this looks solid in the big picture.
> > 
> > Cheers, Daniel
> > 
> > > +		e->chain = NULL;
> > > +	}
> > > +
> > >   	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
> > >   	mutex_unlock(&p->adev->notifier_lock);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > index c3053b83b80c..23219fc3b28c 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > @@ -42,48 +42,6 @@
> > >   #include <linux/pci-p2pdma.h>
> > >   #include <linux/pm_runtime.h>
> > > -static int
> > > -__dma_resv_make_exclusive(struct dma_resv *obj)
> > > -{
> > > -	struct dma_fence **fences;
> > > -	unsigned int count;
> > > -	int r;
> > > -
> > > -	if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
> > > -		return 0;
> > > -
> > > -	r = dma_resv_get_fences(obj, NULL, &count, &fences);
> > > -	if (r)
> > > -		return r;
> > > -
> > > -	if (count == 0) {
> > > -		/* Now that was unexpected. */
> > > -	} else if (count == 1) {
> > > -		dma_resv_add_excl_fence(obj, fences[0]);
> > > -		dma_fence_put(fences[0]);
> > > -		kfree(fences);
> > > -	} else {
> > > -		struct dma_fence_array *array;
> > > -
> > > -		array = dma_fence_array_create(count, fences,
> > > -					       dma_fence_context_alloc(1), 0,
> > > -					       false);
> > > -		if (!array)
> > > -			goto err_fences_put;
> > > -
> > > -		dma_resv_add_excl_fence(obj, &array->base);
> > > -		dma_fence_put(&array->base);
> > > -	}
> > > -
> > > -	return 0;
> > > -
> > > -err_fences_put:
> > > -	while (count--)
> > > -		dma_fence_put(fences[count]);
> > > -	kfree(fences);
> > > -	return -ENOMEM;
> > > -}
> > > -
> > >   /**
> > >    * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
> > >    *
> > > @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
> > >   	if (r < 0)
> > >   		goto out;
> > > -	r = amdgpu_bo_reserve(bo, false);
> > > -	if (unlikely(r != 0))
> > > -		goto out;
> > > -
> > > -	/*
> > > -	 * We only create shared fences for internal use, but importers
> > > -	 * of the dmabuf rely on exclusive fences for implicitly
> > > -	 * tracking write hazards. As any of the current fences may
> > > -	 * correspond to a write, we need to convert all existing
> > > -	 * fences on the reservation object into a single exclusive
> > > -	 * fence.
> > > -	 */
> > > -	r = __dma_resv_make_exclusive(bo->tbo.base.resv);
> > > -	if (r)
> > > -		goto out;
> > > -
> > > -	bo->prime_shared_count++;
> > > -	amdgpu_bo_unreserve(bo);
> > >   	return 0;
> > >   out:
> > > @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
> > >   	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> > >   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > > -	if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
> > > -		bo->prime_shared_count--;
> > > -
> > >   	pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> > >   	pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > >   }
> > > @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
> > >   	bo = gem_to_amdgpu_bo(gobj);
> > >   	bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
> > >   	bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> > > -	if (dma_buf->ops != &amdgpu_dmabuf_ops)
> > > -		bo->prime_shared_count = 1;
> > >   	dma_resv_unlock(resv);
> > >   	return gobj;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > index 1c3e3b608332..5d82120fc160 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
> > >   		break;
> > >   	}
> > >   	case AMDGPU_GEM_OP_SET_PLACEMENT:
> > > -		if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
> > > +		if (robj->tbo.base.import_attach &&
> > > +		    args->value & AMDGPU_GEM_DOMAIN_VRAM) {
> > >   			r = -EINVAL;
> > >   			amdgpu_bo_unreserve(robj);
> > >   			break;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index 96447e1d4c9c..30951b593809 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
> > >   		return -EINVAL;
> > >   	/* A shared bo cannot be migrated to VRAM */
> > > -	if (bo->prime_shared_count || bo->tbo.base.import_attach) {
> > > +	if (bo->tbo.base.import_attach) {
> > >   		if (domain & AMDGPU_GEM_DOMAIN_GTT)
> > >   			domain = AMDGPU_GEM_DOMAIN_GTT;
> > >   		else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index b35962702278..55d7e90eaa72 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -98,7 +98,6 @@ struct amdgpu_bo {
> > >   	struct ttm_buffer_object	tbo;
> > >   	struct ttm_bo_kmap_obj		kmap;
> > >   	u64				flags;
> > > -	unsigned			prime_shared_count;
> > >   	/* per VM structure for page tables and with virtual addresses */
> > >   	struct amdgpu_vm_bo_base	*vm_bo;
> > >   	/* Constant after initialization */
> > > -- 
> > > 2.25.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-11 15:18         ` Daniel Vetter
@ 2021-06-14  7:25           ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-14  7:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
>> Am 11.06.21 um 11:07 schrieb Daniel Vetter:
>>> On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
>>>> Unwrap a the explicit fence if it is a dma_fence_chain and
>>>> sync to the first fence not matching the owner rules.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>>>>    1 file changed, 68 insertions(+), 50 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> index 1b2ceccaf5b0..862eb3c1c4c5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> @@ -28,6 +28,8 @@
>>>>     *    Christian König <christian.koenig@amd.com>
>>>>     */
>>>> +#include <linux/dma-fence-chain.h>
>>>> +
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_trace.h"
>>>>    #include "amdgpu_amdkfd.h"
>>>> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>>>>    	return amdgpu_sync_fence(sync, fence);
>>>>    }
>>>> +/* Determine based on the owner and mode if we should sync to a fence or not */
>>>> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
>>>> +				   enum amdgpu_sync_mode mode,
>>>> +				   void *owner, struct dma_fence *f)
>>>> +{
>>>> +	void *fence_owner = amdgpu_sync_get_owner(f);
>>>> +
>>>> +	/* Always sync to moves, no matter what */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return true;
>>>> +
>>>> +	/* We only want to trigger KFD eviction fences on
>>>> +	 * evict or move jobs. Skip KFD fences otherwise.
>>>> +	 */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return false;
>>>> +
>>>> +	/* Never sync to VM updates either. */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>>>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return false;
>>>> +
>>>> +	/* Ignore fences depending on the sync mode */
>>>> +	switch (mode) {
>>>> +	case AMDGPU_SYNC_ALWAYS:
>>>> +		return true;
>>>> +
>>>> +	case AMDGPU_SYNC_NE_OWNER:
>>>> +		if (amdgpu_sync_same_dev(adev, f) &&
>>>> +		    fence_owner == owner)
>>>> +			return false;
>>>> +		break;
>>>> +
>>>> +	case AMDGPU_SYNC_EQ_OWNER:
>>>> +		if (amdgpu_sync_same_dev(adev, f) &&
>>>> +		    fence_owner != owner)
>>>> +			return false;
>>>> +		break;
>>>> +
>>>> +	case AMDGPU_SYNC_EXPLICIT:
>>>> +		return false;
>>>> +	}
>>>> +
>>>> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>>>> +	     "Adding eviction fence to sync obj");
>>>> +	return true;
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_sync_resv - sync to a reservation object
>>>>     *
>>>> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>>>>    	/* always sync to the exclusive fence */
>>>>    	f = dma_resv_excl_fence(resv);
>>>> -	r = amdgpu_sync_fence(sync, f);
>>>> +	dma_fence_chain_for_each(f, f) {
>>> Jason has some helper for deep-walking fence chains/arrays here I think.
>>> Might want to look into that, so that we have some consistency in how we
>>> pile up multiple exclusive fences.
>> Well those helpers are not from Jason, but from me :)
>>
>> But no, for now the deep inspection is not really helpful here since
>> grabbing a reference to a certain chain node is what that makes the handling
>> easier and faster here.
>>
>> Thinking more about it that should also make it possible for the garbage
>> collection to kick in properly.
> Hm this is tricky to reason about, but yeah with this here it's a true
> chain, and you just need to connect them. But then if a buffer is on
> multiple engines, collapsing things down occasionally might be useful.
>
> But maybe we need to do that in the bigger rework where exclusive fences
> are also just in the dma_fence_list with a "this is an exclusive one btw"
> tag.
>
> I think for the vk import case doing the deep scan makes more sense, it's
> a once-per-frame thing, and there's a much bigger chance that you have a
> pile of fences from different engines on it already.

The problem with Jasons IOCTL is that you *must* do a deep dive and 
flatten out the fences.

Otherwise somebody could use it to create a deep fence structure with 
dma_fence_arrays containing dma_fence_arrays, containing 
dma_fence_arrays etc...

When you then release that structure you overwrite kernel stack because 
the dma_fence_array does a dma_fence_put() on it's entries :)

The dma_fence_chain container is intentionally made in a way to prevent 
that.

> I think a comment explaining why we think deep scan isn't a good idea here
> would be good, just so we can appreciate our foolishness when it all goes
> wrong :-)

Ok, good point.

Thanks,
Christian.

> -Daniel
>
>
>>> Anyway pretty much one of the versions I had in mind too, except I didn't
>>> type it up.
>>>
>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Thanks,
>> Christian.
>>
>>>> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
>>>> +
>>>> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
>>>> +					   chain->fence : f)) {
>>>> +			r = amdgpu_sync_fence(sync, f);
>>>> +			dma_fence_put(f);
>>>> +			if (r)
>>>> +				return r;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>>    	flist = dma_resv_shared_list(resv);
>>>> -	if (!flist || r)
>>>> -		return r;
>>>> +	if (!flist)
>>>> +		return 0;
>>>>    	for (i = 0; i < flist->shared_count; ++i) {
>>>> -		void *fence_owner;
>>>> -
>>>>    		f = rcu_dereference_protected(flist->shared[i],
>>>>    					      dma_resv_held(resv));
>>>> -		fence_owner = amdgpu_sync_get_owner(f);
>>>> -
>>>> -		/* Always sync to moves, no matter what */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
>>>> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>>>>    			r = amdgpu_sync_fence(sync, f);
>>>>    			if (r)
>>>> -				break;
>>>> -		}
>>>> -
>>>> -		/* We only want to trigger KFD eviction fences on
>>>> -		 * evict or move jobs. Skip KFD fences otherwise.
>>>> -		 */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> -			continue;
>>>> -
>>>> -		/* Never sync to VM updates either. */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>>>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> -			continue;
>>>> -
>>>> -		/* Ignore fences depending on the sync mode */
>>>> -		switch (mode) {
>>>> -		case AMDGPU_SYNC_ALWAYS:
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_NE_OWNER:
>>>> -			if (amdgpu_sync_same_dev(adev, f) &&
>>>> -			    fence_owner == owner)
>>>> -				continue;
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_EQ_OWNER:
>>>> -			if (amdgpu_sync_same_dev(adev, f) &&
>>>> -			    fence_owner != owner)
>>>> -				continue;
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_EXPLICIT:
>>>> -			continue;
>>>> +				return r;
>>>>    		}
>>>> -
>>>> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>>>> -		     "Adding eviction fence to sync obj");
>>>> -		r = amdgpu_sync_fence(sync, f);
>>>> -		if (r)
>>>> -			break;
>>>>    	}
>>>> -	return r;
>>>> +	return 0;
>>>>    }
>>>>    /**
>>>> -- 
>>>> 2.25.1
>>>>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-14  7:25           ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-06-14  7:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: amd-gfx, dri-devel

Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
>> Am 11.06.21 um 11:07 schrieb Daniel Vetter:
>>> On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
>>>> Unwrap a the explicit fence if it is a dma_fence_chain and
>>>> sync to the first fence not matching the owner rules.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
>>>>    1 file changed, 68 insertions(+), 50 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> index 1b2ceccaf5b0..862eb3c1c4c5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> @@ -28,6 +28,8 @@
>>>>     *    Christian König <christian.koenig@amd.com>
>>>>     */
>>>> +#include <linux/dma-fence-chain.h>
>>>> +
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_trace.h"
>>>>    #include "amdgpu_amdkfd.h"
>>>> @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
>>>>    	return amdgpu_sync_fence(sync, fence);
>>>>    }
>>>> +/* Determine based on the owner and mode if we should sync to a fence or not */
>>>> +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
>>>> +				   enum amdgpu_sync_mode mode,
>>>> +				   void *owner, struct dma_fence *f)
>>>> +{
>>>> +	void *fence_owner = amdgpu_sync_get_owner(f);
>>>> +
>>>> +	/* Always sync to moves, no matter what */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return true;
>>>> +
>>>> +	/* We only want to trigger KFD eviction fences on
>>>> +	 * evict or move jobs. Skip KFD fences otherwise.
>>>> +	 */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return false;
>>>> +
>>>> +	/* Never sync to VM updates either. */
>>>> +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>>>> +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +		return false;
>>>> +
>>>> +	/* Ignore fences depending on the sync mode */
>>>> +	switch (mode) {
>>>> +	case AMDGPU_SYNC_ALWAYS:
>>>> +		return true;
>>>> +
>>>> +	case AMDGPU_SYNC_NE_OWNER:
>>>> +		if (amdgpu_sync_same_dev(adev, f) &&
>>>> +		    fence_owner == owner)
>>>> +			return false;
>>>> +		break;
>>>> +
>>>> +	case AMDGPU_SYNC_EQ_OWNER:
>>>> +		if (amdgpu_sync_same_dev(adev, f) &&
>>>> +		    fence_owner != owner)
>>>> +			return false;
>>>> +		break;
>>>> +
>>>> +	case AMDGPU_SYNC_EXPLICIT:
>>>> +		return false;
>>>> +	}
>>>> +
>>>> +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>>>> +	     "Adding eviction fence to sync obj");
>>>> +	return true;
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_sync_resv - sync to a reservation object
>>>>     *
>>>> @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
>>>>    	/* always sync to the exclusive fence */
>>>>    	f = dma_resv_excl_fence(resv);
>>>> -	r = amdgpu_sync_fence(sync, f);
>>>> +	dma_fence_chain_for_each(f, f) {
>>> Jason has some helper for deep-walking fence chains/arrays here I think.
>>> Might want to look into that, so that we have some consistency in how we
>>> pile up multiple exclusive fences.
>> Well those helpers are not from Jason, but from me :)
>>
>> But no, for now the deep inspection is not really helpful here since
>> grabbing a reference to a certain chain node is what that makes the handling
>> easier and faster here.
>>
>> Thinking more about it that should also make it possible for the garbage
>> collection to kick in properly.
> Hm this is tricky to reason about, but yeah with this here it's a true
> chain, and you just need to connect them. But then if a buffer is on
> multiple engines, collapsing things down occasionally might be useful.
>
> But maybe we need to do that in the bigger rework where exclusive fences
> are also just in the dma_fence_list with a "this is an exclusive one btw"
> tag.
>
> I think for the vk import case doing the deep scan makes more sense, it's
> a once-per-frame thing, and there's a much bigger chance that you have a
> pile of fences from different engines on it already.

The problem with Jasons IOCTL is that you *must* do a deep dive and 
flatten out the fences.

Otherwise somebody could use it to create a deep fence structure with 
dma_fence_arrays containing dma_fence_arrays, containing 
dma_fence_arrays etc...

When you then release that structure you overwrite kernel stack because 
the dma_fence_array does a dma_fence_put() on it's entries :)

The dma_fence_chain container is intentionally made in a way to prevent 
that.

> I think a comment explaining why we think deep scan isn't a good idea here
> would be good, just so we can appreciate our foolishness when it all goes
> wrong :-)

Ok, good point.

Thanks,
Christian.

> -Daniel
>
>
>>> Anyway pretty much one of the versions I had in mind too, except I didn't
>>> type it up.
>>>
>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Thanks,
>> Christian.
>>
>>>> +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
>>>> +
>>>> +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
>>>> +					   chain->fence : f)) {
>>>> +			r = amdgpu_sync_fence(sync, f);
>>>> +			dma_fence_put(f);
>>>> +			if (r)
>>>> +				return r;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>>    	flist = dma_resv_shared_list(resv);
>>>> -	if (!flist || r)
>>>> -		return r;
>>>> +	if (!flist)
>>>> +		return 0;
>>>>    	for (i = 0; i < flist->shared_count; ++i) {
>>>> -		void *fence_owner;
>>>> -
>>>>    		f = rcu_dereference_protected(flist->shared[i],
>>>>    					      dma_resv_held(resv));
>>>> -		fence_owner = amdgpu_sync_get_owner(f);
>>>> -
>>>> -		/* Always sync to moves, no matter what */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
>>>> +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
>>>>    			r = amdgpu_sync_fence(sync, f);
>>>>    			if (r)
>>>> -				break;
>>>> -		}
>>>> -
>>>> -		/* We only want to trigger KFD eviction fences on
>>>> -		 * evict or move jobs. Skip KFD fences otherwise.
>>>> -		 */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> -			continue;
>>>> -
>>>> -		/* Never sync to VM updates either. */
>>>> -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
>>>> -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> -			continue;
>>>> -
>>>> -		/* Ignore fences depending on the sync mode */
>>>> -		switch (mode) {
>>>> -		case AMDGPU_SYNC_ALWAYS:
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_NE_OWNER:
>>>> -			if (amdgpu_sync_same_dev(adev, f) &&
>>>> -			    fence_owner == owner)
>>>> -				continue;
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_EQ_OWNER:
>>>> -			if (amdgpu_sync_same_dev(adev, f) &&
>>>> -			    fence_owner != owner)
>>>> -				continue;
>>>> -			break;
>>>> -
>>>> -		case AMDGPU_SYNC_EXPLICIT:
>>>> -			continue;
>>>> +				return r;
>>>>    		}
>>>> -
>>>> -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
>>>> -		     "Adding eviction fence to sync obj");
>>>> -		r = amdgpu_sync_fence(sync, f);
>>>> -		if (r)
>>>> -			break;
>>>>    	}
>>>> -	return r;
>>>> +	return 0;
>>>>    }
>>>>    /**
>>>> -- 
>>>> 2.25.1
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-10  9:17   ` Christian König
@ 2021-06-16 17:29     ` kernel test robot
  -1 siblings, 0 replies; 58+ messages in thread
From: kernel test robot @ 2021-06-16 17:29 UTC (permalink / raw)
  To: Christian König, daniel, dri-devel, amd-gfx; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3338 bytes --]

Hi "Christian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next linus/master v5.13-rc6 next-20210616]
[cannot apply to drm-tip/drm-tip drm/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: alpha-randconfig-s031-20210615 (attached as .config)
compiler: alpha-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
        git checkout 058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/dma-buf/dma-fence-chain.c:28:15: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct dma_fence_chain **_x_ @@     got struct dma_fence_chain [noderef] __rcu ** @@
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     expected struct dma_fence_chain **_x_
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     got struct dma_fence_chain [noderef] __rcu **
>> drivers/dma-buf/dma-fence-chain.c:28:13: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct dma_fence_chain [noderef] __rcu **tmp @@     got struct dma_fence_chain **[assigned] __ret @@
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     expected struct dma_fence_chain [noderef] __rcu **tmp
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     got struct dma_fence_chain **[assigned] __ret

vim +28 drivers/dma-buf/dma-fence-chain.c

    16	
    17	/**
    18	 * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
    19	 * @chain: the chain node to enqueue
    20	 *
    21	 * Add the chain node to the end of the gc fifo.
    22	 */
    23	static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
    24	{
    25		struct dma_fence_chain __rcu **tmp;
    26	
    27		RCU_INIT_POINTER(chain->next, NULL);
  > 28		tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
    29			   &chain->next);
    30	
    31		/* This is intentionally unordered since we only need the fifo for gc */
    32		rcu_assign_pointer(*tmp, chain);
    33	}
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22120 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-16 17:29     ` kernel test robot
  0 siblings, 0 replies; 58+ messages in thread
From: kernel test robot @ 2021-06-16 17:29 UTC (permalink / raw)
  To: Christian König, daniel, dri-devel, amd-gfx; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3338 bytes --]

Hi "Christian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next linus/master v5.13-rc6 next-20210616]
[cannot apply to drm-tip/drm-tip drm/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: alpha-randconfig-s031-20210615 (attached as .config)
compiler: alpha-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
        git checkout 058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/dma-buf/dma-fence-chain.c:28:15: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct dma_fence_chain **_x_ @@     got struct dma_fence_chain [noderef] __rcu ** @@
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     expected struct dma_fence_chain **_x_
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     got struct dma_fence_chain [noderef] __rcu **
>> drivers/dma-buf/dma-fence-chain.c:28:13: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct dma_fence_chain [noderef] __rcu **tmp @@     got struct dma_fence_chain **[assigned] __ret @@
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     expected struct dma_fence_chain [noderef] __rcu **tmp
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     got struct dma_fence_chain **[assigned] __ret

vim +28 drivers/dma-buf/dma-fence-chain.c

    16	
    17	/**
    18	 * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
    19	 * @chain: the chain node to enqueue
    20	 *
    21	 * Add the chain node to the end of the gc fifo.
    22	 */
    23	static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
    24	{
    25		struct dma_fence_chain __rcu **tmp;
    26	
    27		RCU_INIT_POINTER(chain->next, NULL);
  > 28		tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
    29			   &chain->next);
    30	
    31		/* This is intentionally unordered since we only need the fifo for gc */
    32		rcu_assign_pointer(*tmp, chain);
    33	}
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22120 bytes --]

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
  2021-06-10  9:17   ` Christian König
@ 2021-06-16 18:50     ` kernel test robot
  -1 siblings, 0 replies; 58+ messages in thread
From: kernel test robot @ 2021-06-16 18:50 UTC (permalink / raw)
  To: Christian König, daniel, dri-devel, amd-gfx; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

Hi "Christian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next linus/master v5.13-rc6 next-20210616]
[cannot apply to drm-tip/drm-tip drm/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-s001-20210615 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
        git checkout 058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/dma-buf/dma-fence-chain.c:28:15: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct dma_fence_chain **__ret @@     got struct dma_fence_chain [noderef] __rcu ** @@
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     expected struct dma_fence_chain **__ret
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     got struct dma_fence_chain [noderef] __rcu **
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct dma_fence_chain [noderef] __rcu **tmp @@     got struct dma_fence_chain **[assigned] __ret @@
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     expected struct dma_fence_chain [noderef] __rcu **tmp
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     got struct dma_fence_chain **[assigned] __ret

vim +28 drivers/dma-buf/dma-fence-chain.c

    16	
    17	/**
    18	 * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
    19	 * @chain: the chain node to enqueue
    20	 *
    21	 * Add the chain node to the end of the gc fifo.
    22	 */
    23	static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
    24	{
    25		struct dma_fence_chain __rcu **tmp;
    26	
    27		RCU_INIT_POINTER(chain->next, NULL);
  > 28		tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
    29			   &chain->next);
    30	
    31		/* This is intentionally unordered since we only need the fifo for gc */
    32		rcu_assign_pointer(*tmp, chain);
    33	}
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42304 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect
@ 2021-06-16 18:50     ` kernel test robot
  0 siblings, 0 replies; 58+ messages in thread
From: kernel test robot @ 2021-06-16 18:50 UTC (permalink / raw)
  To: Christian König, daniel, dri-devel, amd-gfx; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

Hi "Christian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next linus/master v5.13-rc6 next-20210616]
[cannot apply to drm-tip/drm-tip drm/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-s001-20210615 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Christian-K-nig/dma-buf-some-dma_fence_chain-improvements/20210616-182806
        git checkout 058b99a2929f84c7f2cb1516dcc8674bf3f2a661
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/dma-buf/dma-fence-chain.c:28:15: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct dma_fence_chain **__ret @@     got struct dma_fence_chain [noderef] __rcu ** @@
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     expected struct dma_fence_chain **__ret
   drivers/dma-buf/dma-fence-chain.c:28:15: sparse:     got struct dma_fence_chain [noderef] __rcu **
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct dma_fence_chain [noderef] __rcu **tmp @@     got struct dma_fence_chain **[assigned] __ret @@
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     expected struct dma_fence_chain [noderef] __rcu **tmp
   drivers/dma-buf/dma-fence-chain.c:28:13: sparse:     got struct dma_fence_chain **[assigned] __ret

vim +28 drivers/dma-buf/dma-fence-chain.c

    16	
    17	/**
    18	 * dma_fence_chain_enqueue - enqeue a chain node for garbage collection
    19	 * @chain: the chain node to enqueue
    20	 *
    21	 * Add the chain node to the end of the gc fifo.
    22	 */
    23	static void dma_fence_chain_enqueue(struct dma_fence_chain *chain)
    24	{
    25		struct dma_fence_chain __rcu **tmp;
    26	
    27		RCU_INIT_POINTER(chain->next, NULL);
  > 28		tmp = xchg((struct dma_fence_chain __force ***)&fifo_back,
    29			   &chain->next);
    30	
    31		/* This is intentionally unordered since we only need the fifo for gc */
    32		rcu_assign_pointer(*tmp, chain);
    33	}
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42304 bytes --]

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
  2021-06-14  7:25           ` Christian König
@ 2021-06-17 16:51             ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-17 16:51 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel

On Mon, Jun 14, 2021 at 09:25:44AM +0200, Christian König wrote:
> Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> > > Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > > > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > > > sync to the first fence not matching the owner rules.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
> > > > >    1 file changed, 68 insertions(+), 50 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > @@ -28,6 +28,8 @@
> > > > >     *    Christian König <christian.koenig@amd.com>
> > > > >     */
> > > > > +#include <linux/dma-fence-chain.h>
> > > > > +
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_trace.h"
> > > > >    #include "amdgpu_amdkfd.h"
> > > > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
> > > > >    	return amdgpu_sync_fence(sync, fence);
> > > > >    }
> > > > > +/* Determine based on the owner and mode if we should sync to a fence or not */
> > > > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > > > +				   enum amdgpu_sync_mode mode,
> > > > > +				   void *owner, struct dma_fence *f)
> > > > > +{
> > > > > +	void *fence_owner = amdgpu_sync_get_owner(f);
> > > > > +
> > > > > +	/* Always sync to moves, no matter what */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return true;
> > > > > +
> > > > > +	/* We only want to trigger KFD eviction fences on
> > > > > +	 * evict or move jobs. Skip KFD fences otherwise.
> > > > > +	 */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Never sync to VM updates either. */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Ignore fences depending on the sync mode */
> > > > > +	switch (mode) {
> > > > > +	case AMDGPU_SYNC_ALWAYS:
> > > > > +		return true;
> > > > > +
> > > > > +	case AMDGPU_SYNC_NE_OWNER:
> > > > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > > > +		    fence_owner == owner)
> > > > > +			return false;
> > > > > +		break;
> > > > > +
> > > > > +	case AMDGPU_SYNC_EQ_OWNER:
> > > > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > > > +		    fence_owner != owner)
> > > > > +			return false;
> > > > > +		break;
> > > > > +
> > > > > +	case AMDGPU_SYNC_EXPLICIT:
> > > > > +		return false;
> > > > > +	}
> > > > > +
> > > > > +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > +	     "Adding eviction fence to sync obj");
> > > > > +	return true;
> > > > > +}
> > > > > +
> > > > >    /**
> > > > >     * amdgpu_sync_resv - sync to a reservation object
> > > > >     *
> > > > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
> > > > >    	/* always sync to the exclusive fence */
> > > > >    	f = dma_resv_excl_fence(resv);
> > > > > -	r = amdgpu_sync_fence(sync, f);
> > > > > +	dma_fence_chain_for_each(f, f) {
> > > > Jason has some helper for deep-walking fence chains/arrays here I think.
> > > > Might want to look into that, so that we have some consistency in how we
> > > > pile up multiple exclusive fences.
> > > Well those helpers are not from Jason, but from me :)
> > > 
> > > But no, for now the deep inspection is not really helpful here since
> > > grabbing a reference to a certain chain node is what that makes the handling
> > > easier and faster here.
> > > 
> > > Thinking more about it that should also make it possible for the garbage
> > > collection to kick in properly.
> > Hm this is tricky to reason about, but yeah with this here it's a true
> > chain, and you just need to connect them. But then if a buffer is on
> > multiple engines, collapsing things down occasionally might be useful.
> > 
> > But maybe we need to do that in the bigger rework where exclusive fences
> > are also just in the dma_fence_list with a "this is an exclusive one btw"
> > tag.
> > 
> > I think for the vk import case doing the deep scan makes more sense, it's
> > a once-per-frame thing, and there's a much bigger chance that you have a
> > pile of fences from different engines on it already.
> 
> The problem with Jasons IOCTL is that you *must* do a deep dive and flatten
> out the fences.
> 
> Otherwise somebody could use it to create a deep fence structure with
> dma_fence_arrays containing dma_fence_arrays, containing dma_fence_arrays
> etc...
> 
> When you then release that structure you overwrite kernel stack because the
> dma_fence_array does a dma_fence_put() on it's entries :)
> 
> The dma_fence_chain container is intentionally made in a way to prevent
> that.

I'm maybe too dense, but why can't we use a chain with Jason's work too?
At least from high-level semantics it feels like we're doing exactly the
same thing here as the import ioctl does. Only thing different is that the
import ioctl doesn't do an actual CS or anything like that.
-Daniel


> > I think a comment explaining why we think deep scan isn't a good idea here
> > would be good, just so we can appreciate our foolishness when it all goes
> > wrong :-)
> 
> Ok, good point.
> 
> Thanks,
> Christian.
> 
> > -Daniel
> > 
> > 
> > > > Anyway pretty much one of the versions I had in mind too, except I didn't
> > > > type it up.
> > > > 
> > > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Thanks,
> > > Christian.
> > > 
> > > > > +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > > > +
> > > > > +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > > > +					   chain->fence : f)) {
> > > > > +			r = amdgpu_sync_fence(sync, f);
> > > > > +			dma_fence_put(f);
> > > > > +			if (r)
> > > > > +				return r;
> > > > > +			break;
> > > > > +		}
> > > > > +	}
> > > > >    	flist = dma_resv_shared_list(resv);
> > > > > -	if (!flist || r)
> > > > > -		return r;
> > > > > +	if (!flist)
> > > > > +		return 0;
> > > > >    	for (i = 0; i < flist->shared_count; ++i) {
> > > > > -		void *fence_owner;
> > > > > -
> > > > >    		f = rcu_dereference_protected(flist->shared[i],
> > > > >    					      dma_resv_held(resv));
> > > > > -		fence_owner = amdgpu_sync_get_owner(f);
> > > > > -
> > > > > -		/* Always sync to moves, no matter what */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > > > +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > > > >    			r = amdgpu_sync_fence(sync, f);
> > > > >    			if (r)
> > > > > -				break;
> > > > > -		}
> > > > > -
> > > > > -		/* We only want to trigger KFD eviction fences on
> > > > > -		 * evict or move jobs. Skip KFD fences otherwise.
> > > > > -		 */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > -			continue;
> > > > > -
> > > > > -		/* Never sync to VM updates either. */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > -			continue;
> > > > > -
> > > > > -		/* Ignore fences depending on the sync mode */
> > > > > -		switch (mode) {
> > > > > -		case AMDGPU_SYNC_ALWAYS:
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_NE_OWNER:
> > > > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > > > -			    fence_owner == owner)
> > > > > -				continue;
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_EQ_OWNER:
> > > > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > > > -			    fence_owner != owner)
> > > > > -				continue;
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_EXPLICIT:
> > > > > -			continue;
> > > > > +				return r;
> > > > >    		}
> > > > > -
> > > > > -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > -		     "Adding eviction fence to sync obj");
> > > > > -		r = amdgpu_sync_fence(sync, f);
> > > > > -		if (r)
> > > > > -			break;
> > > > >    	}
> > > > > -	return r;
> > > > > +	return 0;
> > > > >    }
> > > > >    /**
> > > > > -- 
> > > > > 2.25.1
> > > > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence
@ 2021-06-17 16:51             ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-06-17 16:51 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx, dri-devel, Daniel Vetter

On Mon, Jun 14, 2021 at 09:25:44AM +0200, Christian König wrote:
> Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> > > Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > > > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > > > sync to the first fence not matching the owner rules.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +++++++++++++----------
> > > > >    1 file changed, 68 insertions(+), 50 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > @@ -28,6 +28,8 @@
> > > > >     *    Christian König <christian.koenig@amd.com>
> > > > >     */
> > > > > +#include <linux/dma-fence-chain.h>
> > > > > +
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_trace.h"
> > > > >    #include "amdgpu_amdkfd.h"
> > > > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct dma_fence *fence)
> > > > >    	return amdgpu_sync_fence(sync, fence);
> > > > >    }
> > > > > +/* Determine based on the owner and mode if we should sync to a fence or not */
> > > > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > > > +				   enum amdgpu_sync_mode mode,
> > > > > +				   void *owner, struct dma_fence *f)
> > > > > +{
> > > > > +	void *fence_owner = amdgpu_sync_get_owner(f);
> > > > > +
> > > > > +	/* Always sync to moves, no matter what */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return true;
> > > > > +
> > > > > +	/* We only want to trigger KFD eviction fences on
> > > > > +	 * evict or move jobs. Skip KFD fences otherwise.
> > > > > +	 */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Never sync to VM updates either. */
> > > > > +	if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > +	    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Ignore fences depending on the sync mode */
> > > > > +	switch (mode) {
> > > > > +	case AMDGPU_SYNC_ALWAYS:
> > > > > +		return true;
> > > > > +
> > > > > +	case AMDGPU_SYNC_NE_OWNER:
> > > > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > > > +		    fence_owner == owner)
> > > > > +			return false;
> > > > > +		break;
> > > > > +
> > > > > +	case AMDGPU_SYNC_EQ_OWNER:
> > > > > +		if (amdgpu_sync_same_dev(adev, f) &&
> > > > > +		    fence_owner != owner)
> > > > > +			return false;
> > > > > +		break;
> > > > > +
> > > > > +	case AMDGPU_SYNC_EXPLICIT:
> > > > > +		return false;
> > > > > +	}
> > > > > +
> > > > > +	WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > +	     "Adding eviction fence to sync obj");
> > > > > +	return true;
> > > > > +}
> > > > > +
> > > > >    /**
> > > > >     * amdgpu_sync_resv - sync to a reservation object
> > > > >     *
> > > > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
> > > > >    	/* always sync to the exclusive fence */
> > > > >    	f = dma_resv_excl_fence(resv);
> > > > > -	r = amdgpu_sync_fence(sync, f);
> > > > > +	dma_fence_chain_for_each(f, f) {
> > > > Jason has some helper for deep-walking fence chains/arrays here I think.
> > > > Might want to look into that, so that we have some consistency in how we
> > > > pile up multiple exclusive fences.
> > > Well those helpers are not from Jason, but from me :)
> > > 
> > > But no, for now the deep inspection is not really helpful here since
> > > grabbing a reference to a certain chain node is what that makes the handling
> > > easier and faster here.
> > > 
> > > Thinking more about it that should also make it possible for the garbage
> > > collection to kick in properly.
> > Hm this is tricky to reason about, but yeah with this here it's a true
> > chain, and you just need to connect them. But then if a buffer is on
> > multiple engines, collapsing things down occasionally might be useful.
> > 
> > But maybe we need to do that in the bigger rework where exclusive fences
> > are also just in the dma_fence_list with a "this is an exclusive one btw"
> > tag.
> > 
> > I think for the vk import case doing the deep scan makes more sense, it's
> > a once-per-frame thing, and there's a much bigger chance that you have a
> > pile of fences from different engines on it already.
> 
> The problem with Jasons IOCTL is that you *must* do a deep dive and flatten
> out the fences.
> 
> Otherwise somebody could use it to create a deep fence structure with
> dma_fence_arrays containing dma_fence_arrays, containing dma_fence_arrays
> etc...
> 
> When you then release that structure you overwrite kernel stack because the
> dma_fence_array does a dma_fence_put() on it's entries :)
> 
> The dma_fence_chain container is intentionally made in a way to prevent
> that.

I'm maybe too dense, but why can't we use a chain with Jason's work too?
At least from high-level semantics it feels like we're doing exactly the
same thing here as the import ioctl does. Only thing different is that the
import ioctl doesn't do an actual CS or anything like that.
-Daniel


> > I think a comment explaining why we think deep scan isn't a good idea here
> > would be good, just so we can appreciate our foolishness when it all goes
> > wrong :-)
> 
> Ok, good point.
> 
> Thanks,
> Christian.
> 
> > -Daniel
> > 
> > 
> > > > Anyway pretty much one of the versions I had in mind too, except I didn't
> > > > type it up.
> > > > 
> > > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Thanks,
> > > Christian.
> > > 
> > > > > +		struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > > > +
> > > > > +		if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > > > +					   chain->fence : f)) {
> > > > > +			r = amdgpu_sync_fence(sync, f);
> > > > > +			dma_fence_put(f);
> > > > > +			if (r)
> > > > > +				return r;
> > > > > +			break;
> > > > > +		}
> > > > > +	}
> > > > >    	flist = dma_resv_shared_list(resv);
> > > > > -	if (!flist || r)
> > > > > -		return r;
> > > > > +	if (!flist)
> > > > > +		return 0;
> > > > >    	for (i = 0; i < flist->shared_count; ++i) {
> > > > > -		void *fence_owner;
> > > > > -
> > > > >    		f = rcu_dereference_protected(flist->shared[i],
> > > > >    					      dma_resv_held(resv));
> > > > > -		fence_owner = amdgpu_sync_get_owner(f);
> > > > > -
> > > > > -		/* Always sync to moves, no matter what */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > > > +		if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > > > >    			r = amdgpu_sync_fence(sync, f);
> > > > >    			if (r)
> > > > > -				break;
> > > > > -		}
> > > > > -
> > > > > -		/* We only want to trigger KFD eviction fences on
> > > > > -		 * evict or move jobs. Skip KFD fences otherwise.
> > > > > -		 */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > -			continue;
> > > > > -
> > > > > -		/* Never sync to VM updates either. */
> > > > > -		if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > -		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > -			continue;
> > > > > -
> > > > > -		/* Ignore fences depending on the sync mode */
> > > > > -		switch (mode) {
> > > > > -		case AMDGPU_SYNC_ALWAYS:
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_NE_OWNER:
> > > > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > > > -			    fence_owner == owner)
> > > > > -				continue;
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_EQ_OWNER:
> > > > > -			if (amdgpu_sync_same_dev(adev, f) &&
> > > > > -			    fence_owner != owner)
> > > > > -				continue;
> > > > > -			break;
> > > > > -
> > > > > -		case AMDGPU_SYNC_EXPLICIT:
> > > > > -			continue;
> > > > > +				return r;
> > > > >    		}
> > > > > -
> > > > > -		WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > -		     "Adding eviction fence to sync obj");
> > > > > -		r = amdgpu_sync_fence(sync, f);
> > > > > -		if (r)
> > > > > -			break;
> > > > >    	}
> > > > > -	return r;
> > > > > +	return 0;
> > > > >    }
> > > > >    /**
> > > > > -- 
> > > > > 2.25.1
> > > > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2021-06-17 16:51 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10  9:17 Change how amdgpu stores fences in dma_resv objects Christian König
2021-06-10  9:17 ` Christian König
2021-06-10  9:17 ` [PATCH 1/7] dma-buf: some dma_fence_chain improvements Christian König
2021-06-10  9:17   ` Christian König
2021-06-11  7:49   ` Daniel Vetter
2021-06-11  7:49     ` Daniel Vetter
2021-06-10  9:17 ` [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free Christian König
2021-06-10  9:17   ` Christian König
2021-06-11  7:54   ` Daniel Vetter
2021-06-11  7:54     ` Daniel Vetter
2021-06-11 11:48     ` Christian König
2021-06-11 11:48       ` Christian König
2021-06-11 14:40       ` Daniel Vetter
2021-06-11 14:40         ` Daniel Vetter
2021-06-10  9:17 ` [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests Christian König
2021-06-10  9:17   ` Christian König
2021-06-11  7:58   ` Daniel Vetter
2021-06-11  7:58     ` Daniel Vetter
2021-06-11 10:04     ` Christian König
2021-06-11 10:04       ` Christian König
2021-06-10  9:17 ` [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect Christian König
2021-06-10  9:17   ` Christian König
2021-06-11  8:58   ` Daniel Vetter
2021-06-11  8:58     ` Daniel Vetter
2021-06-11 10:07     ` Christian König
2021-06-11 10:07       ` Christian König
2021-06-11 15:11       ` Daniel Vetter
2021-06-11 15:11         ` Daniel Vetter
2021-06-16 17:29   ` kernel test robot
2021-06-16 17:29     ` kernel test robot
2021-06-16 18:50   ` kernel test robot
2021-06-16 18:50     ` kernel test robot
2021-06-10  9:17 ` [PATCH 5/7] drm/syncobj: drop the manual garbage collection Christian König
2021-06-10  9:17   ` Christian König
2021-06-10  9:17 ` [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence Christian König
2021-06-10  9:17   ` Christian König
2021-06-11  9:07   ` Daniel Vetter
2021-06-11  9:07     ` Daniel Vetter
2021-06-11 10:09     ` Christian König
2021-06-11 10:09       ` Christian König
2021-06-11 15:18       ` Daniel Vetter
2021-06-11 15:18         ` Daniel Vetter
2021-06-14  7:25         ` Christian König
2021-06-14  7:25           ` Christian König
2021-06-17 16:51           ` Daniel Vetter
2021-06-17 16:51             ` Daniel Vetter
2021-06-10  9:18 ` [PATCH 7/7] drm/amdgpu: rework dma_resv handling Christian König
2021-06-10  9:18   ` Christian König
2021-06-11  9:17   ` Daniel Vetter
2021-06-11  9:17     ` Daniel Vetter
2021-06-11 10:12     ` Christian König
2021-06-11 10:12       ` Christian König
2021-06-11 15:21       ` Daniel Vetter
2021-06-11 15:21         ` Daniel Vetter
2021-06-10 16:34 ` Change how amdgpu stores fences in dma_resv objects Michel Dänzer
2021-06-10 16:34   ` Michel Dänzer
2021-06-10 16:44   ` Christian König
2021-06-10 16:44     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.