All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
@ 2022-09-14 16:43 Arvind Yadav
  2022-09-14 16:43   ` Arvind Yadav
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav

Fence signaling must be enabled to make sure that
the dma_fence_is_signaled() function ever returns true.
Since drivers and implementations sometimes mess this up,
this ensures correct behaviour when DEBUG_WW_MUTEX_SLOWPATH
is used during debugging.
This should make any implementation bugs resulting in not
signaled fences much more obvious.

Arvind Yadav (6):
  [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
  [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
  [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
  [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
  [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug

 drivers/dma-buf/Kconfig                |  7 +++++++
 drivers/dma-buf/dma-fence.c            | 16 ++++++++++------
 drivers/dma-buf/st-dma-fence-chain.c   |  4 ++++
 drivers/dma-buf/st-dma-fence-unwrap.c  | 22 ++++++++++++++++++++++
 drivers/dma-buf/st-dma-fence.c         | 16 ++++++++++++++++
 drivers/dma-buf/st-dma-resv.c          | 10 ++++++++++
 drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
 include/linux/dma-fence.h              |  5 +++++
 8 files changed, 76 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
@ 2022-09-14 16:43   ` Arvind Yadav
  2022-09-14 16:43   ` Arvind Yadav
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav, Christian König

Remove the signaled bit status check because it is returning
early when the fence is already signaled and
__dma_fence_enable_signaling is checking the status of signaled
bit again.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1, v2: This new patch was not part of previous series.

---

 drivers/dma-buf/dma-fence.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..64c99739ad23 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -601,9 +601,6 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
 {
 	unsigned long flags;
 
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
-		return;
-
 	spin_lock_irqsave(fence->lock, flags);
 	__dma_fence_enable_signaling(fence);
 	spin_unlock_irqrestore(fence->lock, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
@ 2022-09-14 16:43   ` Arvind Yadav
  0 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Christian König, Arvind Yadav

Remove the signaled bit status check because it is returning
early when the fence is already signaled and
__dma_fence_enable_signaling is checking the status of signaled
bit again.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1, v2: This new patch was not part of previous series.

---

 drivers/dma-buf/dma-fence.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..64c99739ad23 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -601,9 +601,6 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
 {
 	unsigned long flags;
 
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
-		return;
-
 	spin_lock_irqsave(fence->lock, flags);
 	__dma_fence_enable_signaling(fence);
 	spin_unlock_irqrestore(fence->lock, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
@ 2022-09-14 16:43   ` Arvind Yadav
  2022-09-14 16:43   ` Arvind Yadav
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav, Christian König

Here's setting software signaling bit for the stub fence
which is always signaled. If this fence signaling bit is
not set then the AMD GPU scheduler will cause a GPU reset
due to a GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 3/4]

Changes in v2 :
1 - perviously using  __dma_fence_enable_signaling() for enable
signaling.
2 - #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH  removed

Changes in v3 :
1 - Enable Signaling bit for dma_fence_allocate_private_stub.

---
 drivers/dma-buf/dma-fence.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 64c99739ad23..645c158b7e01 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -136,6 +136,10 @@ struct dma_fence *dma_fence_get_stub(void)
 			       &dma_fence_stub_ops,
 			       &dma_fence_stub_lock,
 			       0, 0);
+
+		set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			&dma_fence_stub.flags);
+
 		dma_fence_signal_locked(&dma_fence_stub);
 	}
 	spin_unlock(&dma_fence_stub_lock);
@@ -161,6 +165,10 @@ struct dma_fence *dma_fence_allocate_private_stub(void)
 		       &dma_fence_stub_ops,
 		       &dma_fence_stub_lock,
 		       0, 0);
+
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+		&dma_fence_stub.flags);
+
 	dma_fence_signal(fence);
 
 	return fence;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
@ 2022-09-14 16:43   ` Arvind Yadav
  0 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Christian König, Arvind Yadav

Here's setting software signaling bit for the stub fence
which is always signaled. If this fence signaling bit is
not set then the AMD GPU scheduler will cause a GPU reset
due to a GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 3/4]

Changes in v2 :
1 - perviously using  __dma_fence_enable_signaling() for enable
signaling.
2 - #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH  removed

Changes in v3 :
1 - Enable Signaling bit for dma_fence_allocate_private_stub.

---
 drivers/dma-buf/dma-fence.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 64c99739ad23..645c158b7e01 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -136,6 +136,10 @@ struct dma_fence *dma_fence_get_stub(void)
 			       &dma_fence_stub_ops,
 			       &dma_fence_stub_lock,
 			       0, 0);
+
+		set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			&dma_fence_stub.flags);
+
 		dma_fence_signal_locked(&dma_fence_stub);
 	}
 	spin_unlock(&dma_fence_stub_lock);
@@ -161,6 +165,10 @@ struct dma_fence *dma_fence_allocate_private_stub(void)
 		       &dma_fence_stub_ops,
 		       &dma_fence_stub_lock,
 		       0, 0);
+
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+		&dma_fence_stub.flags);
+
 	dma_fence_signal(fence);
 
 	return fence;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
@ 2022-09-14 16:43   ` Arvind Yadav
  2022-09-14 16:43   ` Arvind Yadav
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav, Christian König

Here's enabling software signaling on fence for selftest.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 4/4]

Changes in v2 :
1- #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH removed

---

 drivers/dma-buf/st-dma-fence-chain.c  |  4 ++++
 drivers/dma-buf/st-dma-fence-unwrap.c | 22 ++++++++++++++++++++++
 drivers/dma-buf/st-dma-fence.c        | 16 ++++++++++++++++
 drivers/dma-buf/st-dma-resv.c         | 10 ++++++++++
 4 files changed, 52 insertions(+)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..0a9b099d0518 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -87,6 +87,8 @@ static int sanitycheck(void *arg)
 	if (!chain)
 		err = -ENOMEM;
 
+	dma_fence_enable_sw_signaling(chain);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -143,6 +145,8 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
 		}
 
 		fc->tail = fc->chains[i];
+
+		dma_fence_enable_sw_signaling(fc->chains[i]);
 	}
 
 	fc->chain_length = i;
diff --git a/drivers/dma-buf/st-dma-fence-unwrap.c b/drivers/dma-buf/st-dma-fence-unwrap.c
index 4105d5ea8dde..f0cee984b6c7 100644
--- a/drivers/dma-buf/st-dma-fence-unwrap.c
+++ b/drivers/dma-buf/st-dma-fence-unwrap.c
@@ -102,6 +102,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	array = mock_array(1, f);
 	if (!array)
 		return -ENOMEM;
@@ -124,12 +126,16 @@ static int unwrap_array(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	array = mock_array(2, f1, f2);
 	if (!array)
 		return -ENOMEM;
@@ -164,12 +170,16 @@ static int unwrap_chain(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	chain = mock_chain(f1, f2);
 	if (!chain)
 		return -ENOMEM;
@@ -204,12 +214,16 @@ static int unwrap_chain_array(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	array = mock_array(2, f1, f2);
 	if (!array)
 		return -ENOMEM;
@@ -248,12 +262,16 @@ static int unwrap_merge(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		err = -ENOMEM;
 		goto error_put_f1;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	f3 = dma_fence_unwrap_merge(f1, f2);
 	if (!f3) {
 		err = -ENOMEM;
@@ -296,10 +314,14 @@ static int unwrap_merge_complex(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2)
 		goto error_put_f1;
 
+	dma_fence_enable_sw_signaling(f2);
+
 	f3 = dma_fence_unwrap_merge(f1, f2);
 	if (!f3)
 		goto error_put_f2;
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index c8a12d7ad71a..fb6e0a6ae2c9 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -102,6 +102,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -117,6 +119,8 @@ static int test_signaling(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_is_signaled(f)) {
 		pr_err("Fence unexpectedly signaled on creation\n");
 		goto err_free;
@@ -190,6 +194,8 @@ static int test_late_add_callback(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 
 	if (!dma_fence_add_callback(f, &cb.cb, simple_callback)) {
@@ -282,6 +288,8 @@ static int test_status(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_get_status(f)) {
 		pr_err("Fence unexpectedly has signaled status on creation\n");
 		goto err_free;
@@ -308,6 +316,8 @@ static int test_error(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_set_error(f, -EIO);
 
 	if (dma_fence_get_status(f)) {
@@ -337,6 +347,8 @@ static int test_wait(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_wait_timeout(f, false, 0) != -ETIME) {
 		pr_err("Wait reported complete before being signaled\n");
 		goto err_free;
@@ -379,6 +391,8 @@ static int test_wait_timeout(void *arg)
 	if (!wt.f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(wt.f);
+
 	if (dma_fence_wait_timeout(wt.f, false, 1) != -ETIME) {
 		pr_err("Wait reported complete before being signaled\n");
 		goto err_free;
@@ -458,6 +472,8 @@ static int thread_signal_callback(void *arg)
 			break;
 		}
 
+		dma_fence_enable_sw_signaling(f1);
+
 		rcu_assign_pointer(t->fences[t->id], f1);
 		smp_wmb();
 
diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
index 813779e3c9be..15dbea1462ed 100644
--- a/drivers/dma-buf/st-dma-resv.c
+++ b/drivers/dma-buf/st-dma-resv.c
@@ -45,6 +45,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -69,6 +71,8 @@ static int test_signaling(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -114,6 +118,8 @@ static int test_for_each(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -173,6 +179,8 @@ static int test_for_each_unlocked(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -244,6 +252,8 @@ static int test_get_fences(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
@ 2022-09-14 16:43   ` Arvind Yadav
  0 siblings, 0 replies; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Christian König, Arvind Yadav

Here's enabling software signaling on fence for selftest.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 4/4]

Changes in v2 :
1- #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH removed

---

 drivers/dma-buf/st-dma-fence-chain.c  |  4 ++++
 drivers/dma-buf/st-dma-fence-unwrap.c | 22 ++++++++++++++++++++++
 drivers/dma-buf/st-dma-fence.c        | 16 ++++++++++++++++
 drivers/dma-buf/st-dma-resv.c         | 10 ++++++++++
 4 files changed, 52 insertions(+)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..0a9b099d0518 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -87,6 +87,8 @@ static int sanitycheck(void *arg)
 	if (!chain)
 		err = -ENOMEM;
 
+	dma_fence_enable_sw_signaling(chain);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -143,6 +145,8 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
 		}
 
 		fc->tail = fc->chains[i];
+
+		dma_fence_enable_sw_signaling(fc->chains[i]);
 	}
 
 	fc->chain_length = i;
diff --git a/drivers/dma-buf/st-dma-fence-unwrap.c b/drivers/dma-buf/st-dma-fence-unwrap.c
index 4105d5ea8dde..f0cee984b6c7 100644
--- a/drivers/dma-buf/st-dma-fence-unwrap.c
+++ b/drivers/dma-buf/st-dma-fence-unwrap.c
@@ -102,6 +102,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	array = mock_array(1, f);
 	if (!array)
 		return -ENOMEM;
@@ -124,12 +126,16 @@ static int unwrap_array(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	array = mock_array(2, f1, f2);
 	if (!array)
 		return -ENOMEM;
@@ -164,12 +170,16 @@ static int unwrap_chain(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	chain = mock_chain(f1, f2);
 	if (!chain)
 		return -ENOMEM;
@@ -204,12 +214,16 @@ static int unwrap_chain_array(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		dma_fence_put(f1);
 		return -ENOMEM;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	array = mock_array(2, f1, f2);
 	if (!array)
 		return -ENOMEM;
@@ -248,12 +262,16 @@ static int unwrap_merge(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2) {
 		err = -ENOMEM;
 		goto error_put_f1;
 	}
 
+	dma_fence_enable_sw_signaling(f2);
+
 	f3 = dma_fence_unwrap_merge(f1, f2);
 	if (!f3) {
 		err = -ENOMEM;
@@ -296,10 +314,14 @@ static int unwrap_merge_complex(void *arg)
 	if (!f1)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f1);
+
 	f2 = mock_fence();
 	if (!f2)
 		goto error_put_f1;
 
+	dma_fence_enable_sw_signaling(f2);
+
 	f3 = dma_fence_unwrap_merge(f1, f2);
 	if (!f3)
 		goto error_put_f2;
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index c8a12d7ad71a..fb6e0a6ae2c9 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -102,6 +102,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -117,6 +119,8 @@ static int test_signaling(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_is_signaled(f)) {
 		pr_err("Fence unexpectedly signaled on creation\n");
 		goto err_free;
@@ -190,6 +194,8 @@ static int test_late_add_callback(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 
 	if (!dma_fence_add_callback(f, &cb.cb, simple_callback)) {
@@ -282,6 +288,8 @@ static int test_status(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_get_status(f)) {
 		pr_err("Fence unexpectedly has signaled status on creation\n");
 		goto err_free;
@@ -308,6 +316,8 @@ static int test_error(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_set_error(f, -EIO);
 
 	if (dma_fence_get_status(f)) {
@@ -337,6 +347,8 @@ static int test_wait(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	if (dma_fence_wait_timeout(f, false, 0) != -ETIME) {
 		pr_err("Wait reported complete before being signaled\n");
 		goto err_free;
@@ -379,6 +391,8 @@ static int test_wait_timeout(void *arg)
 	if (!wt.f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(wt.f);
+
 	if (dma_fence_wait_timeout(wt.f, false, 1) != -ETIME) {
 		pr_err("Wait reported complete before being signaled\n");
 		goto err_free;
@@ -458,6 +472,8 @@ static int thread_signal_callback(void *arg)
 			break;
 		}
 
+		dma_fence_enable_sw_signaling(f1);
+
 		rcu_assign_pointer(t->fences[t->id], f1);
 		smp_wmb();
 
diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
index 813779e3c9be..15dbea1462ed 100644
--- a/drivers/dma-buf/st-dma-resv.c
+++ b/drivers/dma-buf/st-dma-resv.c
@@ -45,6 +45,8 @@ static int sanitycheck(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_fence_signal(f);
 	dma_fence_put(f);
 
@@ -69,6 +71,8 @@ static int test_signaling(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -114,6 +118,8 @@ static int test_for_each(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -173,6 +179,8 @@ static int test_for_each_unlocked(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
@@ -244,6 +252,8 @@ static int test_get_fences(void *arg)
 	if (!f)
 		return -ENOMEM;
 
+	dma_fence_enable_sw_signaling(f);
+
 	dma_resv_init(&resv);
 	r = dma_resv_lock(&resv, NULL);
 	if (r) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
                   ` (2 preceding siblings ...)
  2022-09-14 16:43   ` Arvind Yadav
@ 2022-09-14 16:43 ` Arvind Yadav
  2022-09-15 12:06   ` Christian König
  2022-09-14 16:43 ` [PATCH v4 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav

dma_fence_wait() should always enable signaling even
when the fence is already signaled.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
---

Changes in v1..v3: This new patch was not part of previous series.

---

 drivers/dma-buf/dma-fence.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 645c158b7e01..a5fbf1c1e0ea 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -508,6 +508,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 
 	__dma_fence_might_wait();
 
+	dma_fence_enable_sw_signaling(fence);
+
 	trace_dma_fence_wait_start(fence);
 	if (fence->ops->wait)
 		ret = fence->ops->wait(fence, intr, timeout);
@@ -771,9 +773,6 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 		goto out;
 	}
 
-	if (!__dma_fence_enable_signaling(fence))
-		goto out;
-
 	if (!timeout) {
 		ret = 0;
 		goto out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
                   ` (3 preceding siblings ...)
  2022-09-14 16:43 ` [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling Arvind Yadav
@ 2022-09-14 16:43 ` Arvind Yadav
  2022-09-29 14:53   ` Steven Price
  2022-09-14 16:43 ` [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
  2022-09-15 12:07 ` [PATCH v4 0/6] " Christian König
  6 siblings, 1 reply; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav

Using the parent fence instead of the finished fence
to get the job status. This change is to avoid GPU
scheduler timeout error which can cause GPU reset.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---

changes in v1,v2 - Enable signaling for finished fence in sche_main()
is removed

---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e0ab14e0fb6b..2ac28ad11432 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 	job = list_first_entry_or_null(&sched->pending_list,
 				       struct drm_sched_job, list);
 
-	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
+	if (job && dma_fence_is_signaled(job->s_fence->parent)) {
 		/* remove job from pending_list */
 		list_del_init(&job->list);
 
@@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 
 		if (next) {
 			next->s_fence->scheduled.timestamp =
-				job->s_fence->finished.timestamp;
+				job->s_fence->parent->timestamp;
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
                   ` (4 preceding siblings ...)
  2022-09-14 16:43 ` [PATCH v4 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
@ 2022-09-14 16:43 ` Arvind Yadav
  2022-09-15 12:06   ` Christian König
  2022-09-15 12:07 ` [PATCH v4 0/6] " Christian König
  6 siblings, 1 reply; 27+ messages in thread
From: Arvind Yadav @ 2022-09-14 16:43 UTC (permalink / raw)
  To: Christian.Koenig, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: Arvind Yadav

Fence signaling must be enabled to make sure that
the dma_fence_is_signaled() function ever returns true.
Since drivers and implementations sometimes mess this up,
this ensures correct behaviour when DMABUF_DEBUG_ENABLE_SIGNALING
is used during debugging.
This should make any implementation bugs resulting in not
signaled fences much more obvious.

Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
---

Changes in v1,v2 :
1- Addressing Christian's comment to replace
CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
2- As per Christian's comment moving this patch at last so
The version of this patch is also changed and previously
it was [PATCH 1/4]

Changes in v3: 
1 - Adding new config DMABUF_DEBUG_ENABLE_SIGNALING.
2 - Replace CONFIG_DEBUG_WW_MUTEX_SLOWPATH to new
CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING.

---
 drivers/dma-buf/Kconfig   | 7 +++++++
 include/linux/dma-fence.h | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index e4dc53a36428..c991e6a51510 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -65,6 +65,13 @@ config DMABUF_SELFTESTS
 	default n
 	depends on DMA_SHARED_BUFFER
 
+config DMABUF_DEBUG_ENABLE_SIGNALING
+	bool "DMA Fence enable signaling debug checks"
+	default n
+	depends on DMA_SHARED_BUFFER
+	help
+	  This option enables additional checks for software signaling of fence.
+
 menuconfig DMABUF_HEAPS
 	bool "DMA-BUF Userland Memory Heaps"
 	select DMA_SHARED_BUFFER
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..01e1fa4d3cec 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
 static inline bool
 dma_fence_is_signaled(struct dma_fence *fence)
 {
+#ifdef CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING
+	if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+		return false;
+#endif
+
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return true;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
  2022-09-14 16:43 ` [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling Arvind Yadav
@ 2022-09-15 12:06   ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-15 12:06 UTC (permalink / raw)
  To: Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 14.09.22 um 18:43 schrieb Arvind Yadav:
> dma_fence_wait() should always enable signaling even
> when the fence is already signaled.
>
> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>
> Changes in v1..v3: This new patch was not part of previous series.
>
> ---
>
>   drivers/dma-buf/dma-fence.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 645c158b7e01..a5fbf1c1e0ea 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -508,6 +508,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
>   
>   	__dma_fence_might_wait();
>   
> +	dma_fence_enable_sw_signaling(fence);
> +
>   	trace_dma_fence_wait_start(fence);
>   	if (fence->ops->wait)
>   		ret = fence->ops->wait(fence, intr, timeout);
> @@ -771,9 +773,6 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
>   		goto out;
>   	}
>   
> -	if (!__dma_fence_enable_signaling(fence))
> -		goto out;
> -
>   	if (!timeout) {
>   		ret = 0;
>   		goto out;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-14 16:43 ` [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
@ 2022-09-15 12:06   ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-15 12:06 UTC (permalink / raw)
  To: Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 14.09.22 um 18:43 schrieb Arvind Yadav:
> Fence signaling must be enabled to make sure that
> the dma_fence_is_signaled() function ever returns true.
> Since drivers and implementations sometimes mess this up,
> this ensures correct behaviour when DMABUF_DEBUG_ENABLE_SIGNALING
> is used during debugging.
> This should make any implementation bugs resulting in not
> signaled fences much more obvious.
>
> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>
> Changes in v1,v2 :
> 1- Addressing Christian's comment to replace
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
> 2- As per Christian's comment moving this patch at last so
> The version of this patch is also changed and previously
> it was [PATCH 1/4]
>
> Changes in v3:
> 1 - Adding new config DMABUF_DEBUG_ENABLE_SIGNALING.
> 2 - Replace CONFIG_DEBUG_WW_MUTEX_SLOWPATH to new
> CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING.
>
> ---
>   drivers/dma-buf/Kconfig   | 7 +++++++
>   include/linux/dma-fence.h | 5 +++++
>   2 files changed, 12 insertions(+)
>
> diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
> index e4dc53a36428..c991e6a51510 100644
> --- a/drivers/dma-buf/Kconfig
> +++ b/drivers/dma-buf/Kconfig
> @@ -65,6 +65,13 @@ config DMABUF_SELFTESTS
>   	default n
>   	depends on DMA_SHARED_BUFFER
>   
> +config DMABUF_DEBUG_ENABLE_SIGNALING
> +	bool "DMA Fence enable signaling debug checks"
> +	default n
> +	depends on DMA_SHARED_BUFFER
> +	help
> +	  This option enables additional checks for software signaling of fence.
> +
>   menuconfig DMABUF_HEAPS
>   	bool "DMA-BUF Userland Memory Heaps"
>   	select DMA_SHARED_BUFFER
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 775cdc0b4f24..01e1fa4d3cec 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>   static inline bool
>   dma_fence_is_signaled(struct dma_fence *fence)
>   {
> +#ifdef CONFIG_DMABUF_DEBUG_ENABLE_SIGNALING
> +	if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> +		return false;
> +#endif
> +
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return true;
>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
                   ` (5 preceding siblings ...)
  2022-09-14 16:43 ` [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
@ 2022-09-15 12:07 ` Christian König
  2022-09-15 13:02   ` Yadav, Arvind
  6 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-09-15 12:07 UTC (permalink / raw)
  To: Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Is that sufficient to allow running a desktop on amdgpu with the extra 
check enabled? If yes that would be quite a milestone.

What's left is checking the userspace IGT tests. Especially the 
sync_file and drm_syncobj tests I would expect to have problems with 
this extra check.

Thanks,
Christian.

Am 14.09.22 um 18:43 schrieb Arvind Yadav:
> Fence signaling must be enabled to make sure that
> the dma_fence_is_signaled() function ever returns true.
> Since drivers and implementations sometimes mess this up,
> this ensures correct behaviour when DEBUG_WW_MUTEX_SLOWPATH
> is used during debugging.
> This should make any implementation bugs resulting in not
> signaled fences much more obvious.
>
> Arvind Yadav (6):
>    [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
>    [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
>    [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
>    [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
>    [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
>    [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug
>
>   drivers/dma-buf/Kconfig                |  7 +++++++
>   drivers/dma-buf/dma-fence.c            | 16 ++++++++++------
>   drivers/dma-buf/st-dma-fence-chain.c   |  4 ++++
>   drivers/dma-buf/st-dma-fence-unwrap.c  | 22 ++++++++++++++++++++++
>   drivers/dma-buf/st-dma-fence.c         | 16 ++++++++++++++++
>   drivers/dma-buf/st-dma-resv.c          | 10 ++++++++++
>   drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
>   include/linux/dma-fence.h              |  5 +++++
>   8 files changed, 76 insertions(+), 8 deletions(-)
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-15 12:07 ` [PATCH v4 0/6] " Christian König
@ 2022-09-15 13:02   ` Yadav, Arvind
  2022-09-15 16:05     ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Yadav, Arvind @ 2022-09-15 13:02 UTC (permalink / raw)
  To: Christian König, Arvind Yadav, andrey.grodzovsky,
	shashank.sharma, amaranath.somalapuram, Arunpravin.PaneerSelvam,
	sumit.semwal, gustavo, airlied, daniel, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel


On 9/15/2022 5:37 PM, Christian König wrote:
> Is that sufficient to allow running a desktop on amdgpu with the extra 
> check enabled? If yes that would be quite a milestone.
>
Yes, It is running on amdgpu with extra config enabled.
> What's left is checking the userspace IGT tests. Especially the 
> sync_file and drm_syncobj tests I would expect to have problems with 
> this extra check.
>
Yes, IGT test cases are failing .

~Arvind

> Thanks,
> Christian.
>
> Am 14.09.22 um 18:43 schrieb Arvind Yadav:
>> Fence signaling must be enabled to make sure that
>> the dma_fence_is_signaled() function ever returns true.
>> Since drivers and implementations sometimes mess this up,
>> this ensures correct behaviour when DEBUG_WW_MUTEX_SLOWPATH
>> is used during debugging.
>> This should make any implementation bugs resulting in not
>> signaled fences much more obvious.
>>
>> Arvind Yadav (6):
>>    [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
>>    [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
>>    [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
>>    [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
>>    [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
>>    [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug
>>
>>   drivers/dma-buf/Kconfig                |  7 +++++++
>>   drivers/dma-buf/dma-fence.c            | 16 ++++++++++------
>>   drivers/dma-buf/st-dma-fence-chain.c   |  4 ++++
>>   drivers/dma-buf/st-dma-fence-unwrap.c  | 22 ++++++++++++++++++++++
>>   drivers/dma-buf/st-dma-fence.c         | 16 ++++++++++++++++
>>   drivers/dma-buf/st-dma-resv.c          | 10 ++++++++++
>>   drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
>>   include/linux/dma-fence.h              |  5 +++++
>>   8 files changed, 76 insertions(+), 8 deletions(-)
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-15 13:02   ` Yadav, Arvind
@ 2022-09-15 16:05     ` Christian König
  2022-09-17  6:17         ` Ville Syrjälä
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-09-15 16:05 UTC (permalink / raw)
  To: Yadav, Arvind, Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
>
> On 9/15/2022 5:37 PM, Christian König wrote:
>> Is that sufficient to allow running a desktop on amdgpu with the 
>> extra check enabled? If yes that would be quite a milestone.
>>
> Yes, It is running on amdgpu with extra config enabled.

In this case I will start pushing the patches to drm-misc-next. I'm just 
going to leave out the last one until the IGT tests are working as well.

>> What's left is checking the userspace IGT tests. Especially the 
>> sync_file and drm_syncobj tests I would expect to have problems with 
>> this extra check.
>>
> Yes, IGT test cases are failing

Yeah, as noted on the call please investigate.

This one is the real reason why I wanted somebody to look at this. My 
suspicion is that we have missing calls to 
dma_fence_enable_sw_signaling() in the drm_syncobj code.

Thanks,
Christian.

>
> ~Arvind
>
>> Thanks,
>> Christian.
>>
>> Am 14.09.22 um 18:43 schrieb Arvind Yadav:
>>> Fence signaling must be enabled to make sure that
>>> the dma_fence_is_signaled() function ever returns true.
>>> Since drivers and implementations sometimes mess this up,
>>> this ensures correct behaviour when DEBUG_WW_MUTEX_SLOWPATH
>>> is used during debugging.
>>> This should make any implementation bugs resulting in not
>>> signaled fences much more obvious.
>>>
>>> Arvind Yadav (6):
>>>    [PATCH v4 1/6] dma-buf: Remove the signaled bit status check
>>>    [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence
>>>    [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests
>>>    [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling
>>>    [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
>>>    [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on 
>>> debug
>>>
>>>   drivers/dma-buf/Kconfig                |  7 +++++++
>>>   drivers/dma-buf/dma-fence.c            | 16 ++++++++++------
>>>   drivers/dma-buf/st-dma-fence-chain.c   |  4 ++++
>>>   drivers/dma-buf/st-dma-fence-unwrap.c  | 22 ++++++++++++++++++++++
>>>   drivers/dma-buf/st-dma-fence.c         | 16 ++++++++++++++++
>>>   drivers/dma-buf/st-dma-resv.c          | 10 ++++++++++
>>>   drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
>>>   include/linux/dma-fence.h              |  5 +++++
>>>   8 files changed, 76 insertions(+), 8 deletions(-)
>>>
>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-15 16:05     ` Christian König
@ 2022-09-17  6:17         ` Ville Syrjälä
  0 siblings, 0 replies; 27+ messages in thread
From: Ville Syrjälä @ 2022-09-17  6:17 UTC (permalink / raw)
  To: Christian König
  Cc: Yadav, Arvind, Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
> >
> > On 9/15/2022 5:37 PM, Christian König wrote:
> >> Is that sufficient to allow running a desktop on amdgpu with the 
> >> extra check enabled? If yes that would be quite a milestone.
> >>
> > Yes, It is running on amdgpu with extra config enabled.
> 
> In this case I will start pushing the patches to drm-misc-next. I'm just 
> going to leave out the last one until the IGT tests are working as well.

ffs Christian. intel CI blew up yet again:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html

The last time (some ttm thing) was just a week or two ago,
so it's really getting tiresome watching you push entirely
untested stuff all the time. Would be really helpful if you
finally started to do/require premerge testing.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
@ 2022-09-17  6:17         ` Ville Syrjälä
  0 siblings, 0 replies; 27+ messages in thread
From: Ville Syrjälä @ 2022-09-17  6:17 UTC (permalink / raw)
  To: Christian König
  Cc: Arunpravin.PaneerSelvam, airlied, gustavo, amaranath.somalapuram,
	linux-kernel, Arvind Yadav, linaro-mm-sig, dri-devel, Yadav,
	Arvind, shashank.sharma, sumit.semwal, linux-media

On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
> >
> > On 9/15/2022 5:37 PM, Christian König wrote:
> >> Is that sufficient to allow running a desktop on amdgpu with the 
> >> extra check enabled? If yes that would be quite a milestone.
> >>
> > Yes, It is running on amdgpu with extra config enabled.
> 
> In this case I will start pushing the patches to drm-misc-next. I'm just 
> going to leave out the last one until the IGT tests are working as well.

ffs Christian. intel CI blew up yet again:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html

The last time (some ttm thing) was just a week or two ago,
so it's really getting tiresome watching you push entirely
untested stuff all the time. Would be really helpful if you
finally started to do/require premerge testing.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-17  6:17         ` Ville Syrjälä
@ 2022-09-17 15:18           ` Christian König
  -1 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-17 15:18 UTC (permalink / raw)
  To: Ville Syrjälä, Christian König
  Cc: Yadav, Arvind, Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
> On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
>> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
>>> On 9/15/2022 5:37 PM, Christian König wrote:
>>>> Is that sufficient to allow running a desktop on amdgpu with the
>>>> extra check enabled? If yes that would be quite a milestone.
>>>>
>>> Yes, It is running on amdgpu with extra config enabled.
>> In this case I will start pushing the patches to drm-misc-next. I'm just
>> going to leave out the last one until the IGT tests are working as well.
> ffs Christian. intel CI blew up yet again:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html
>
> The last time (some ttm thing) was just a week or two ago,
> so it's really getting tiresome watching you push entirely
> untested stuff all the time. Would be really helpful if you
> finally started to do/require premerge testing.

Well first of all sorry for causing trouble, but as I wrote above I 
intentionally left out the last one to *not* break the IGT tests.

The patches pushed so far where just updating a bunch of corner cases 
and fixing the selftests.

Do you have any more insight why that should affect the IGT tests?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
@ 2022-09-17 15:18           ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-17 15:18 UTC (permalink / raw)
  To: Ville Syrjälä, Christian König
  Cc: Arunpravin.PaneerSelvam, airlied, gustavo, amaranath.somalapuram,
	linux-kernel, Arvind Yadav, linaro-mm-sig, dri-devel, Yadav,
	Arvind, shashank.sharma, sumit.semwal, linux-media

Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
> On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
>> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
>>> On 9/15/2022 5:37 PM, Christian König wrote:
>>>> Is that sufficient to allow running a desktop on amdgpu with the
>>>> extra check enabled? If yes that would be quite a milestone.
>>>>
>>> Yes, It is running on amdgpu with extra config enabled.
>> In this case I will start pushing the patches to drm-misc-next. I'm just
>> going to leave out the last one until the IGT tests are working as well.
> ffs Christian. intel CI blew up yet again:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html
>
> The last time (some ttm thing) was just a week or two ago,
> so it's really getting tiresome watching you push entirely
> untested stuff all the time. Would be really helpful if you
> finally started to do/require premerge testing.

Well first of all sorry for causing trouble, but as I wrote above I 
intentionally left out the last one to *not* break the IGT tests.

The patches pushed so far where just updating a bunch of corner cases 
and fixing the selftests.

Do you have any more insight why that should affect the IGT tests?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-17 15:18           ` Christian König
@ 2022-09-19 11:26             ` Ville Syrjälä
  -1 siblings, 0 replies; 27+ messages in thread
From: Ville Syrjälä @ 2022-09-19 11:26 UTC (permalink / raw)
  To: Christian König
  Cc: shashank.sharma, airlied, gustavo, Arunpravin.PaneerSelvam,
	amaranath.somalapuram, linux-kernel, Arvind Yadav, sumit.semwal,
	linaro-mm-sig, dri-devel, Yadav, Arvind, Christian König,
	linux-media

On Sat, Sep 17, 2022 at 05:18:40PM +0200, Christian König wrote:
> Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
> > On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
> >> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
> >>> On 9/15/2022 5:37 PM, Christian König wrote:
> >>>> Is that sufficient to allow running a desktop on amdgpu with the
> >>>> extra check enabled? If yes that would be quite a milestone.
> >>>>
> >>> Yes, It is running on amdgpu with extra config enabled.
> >> In this case I will start pushing the patches to drm-misc-next. I'm just
> >> going to leave out the last one until the IGT tests are working as well.
> > ffs Christian. intel CI blew up yet again:
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html
> >
> > The last time (some ttm thing) was just a week or two ago,
> > so it's really getting tiresome watching you push entirely
> > untested stuff all the time. Would be really helpful if you
> > finally started to do/require premerge testing.
> 
> Well first of all sorry for causing trouble, but as I wrote above I 
> intentionally left out the last one to *not* break the IGT tests.
> 
> The patches pushed so far where just updating a bunch of corner cases 
> and fixing the selftests.
> 
> Do you have any more insight why that should affect the IGT tests?

I have no idea. You have the oopses from pstore right there.
Did you even look at them?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
@ 2022-09-19 11:26             ` Ville Syrjälä
  0 siblings, 0 replies; 27+ messages in thread
From: Ville Syrjälä @ 2022-09-19 11:26 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Yadav, Arvind, Arvind Yadav,
	andrey.grodzovsky, shashank.sharma, amaranath.somalapuram,
	Arunpravin.PaneerSelvam, sumit.semwal, gustavo, airlied, daniel,
	linux-media, dri-devel, linaro-mm-sig, linux-kernel

On Sat, Sep 17, 2022 at 05:18:40PM +0200, Christian König wrote:
> Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
> > On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
> >> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
> >>> On 9/15/2022 5:37 PM, Christian König wrote:
> >>>> Is that sufficient to allow running a desktop on amdgpu with the
> >>>> extra check enabled? If yes that would be quite a milestone.
> >>>>
> >>> Yes, It is running on amdgpu with extra config enabled.
> >> In this case I will start pushing the patches to drm-misc-next. I'm just
> >> going to leave out the last one until the IGT tests are working as well.
> > ffs Christian. intel CI blew up yet again:
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12146/shard-glk7/igt@kms_plane_lowres@tiling-y@pipe-c-hdmi-a-2.html
> >
> > The last time (some ttm thing) was just a week or two ago,
> > so it's really getting tiresome watching you push entirely
> > untested stuff all the time. Would be really helpful if you
> > finally started to do/require premerge testing.
> 
> Well first of all sorry for causing trouble, but as I wrote above I 
> intentionally left out the last one to *not* break the IGT tests.
> 
> The patches pushed so far where just updating a bunch of corner cases 
> and fixing the selftests.
> 
> Do you have any more insight why that should affect the IGT tests?

I have no idea. You have the oopses from pstore right there.
Did you even look at them?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
  2022-09-19 11:26             ` Ville Syrjälä
@ 2022-09-19 11:38               ` Christian König
  -1 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-19 11:38 UTC (permalink / raw)
  To: Ville Syrjälä, Christian König
  Cc: Yadav, Arvind, Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 19.09.22 um 13:26 schrieb Ville Syrjälä:
> On Sat, Sep 17, 2022 at 05:18:40PM +0200, Christian König wrote:
>> Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
>>> On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
>>>> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
>>>>> On 9/15/2022 5:37 PM, Christian König wrote:
>>>>>> Is that sufficient to allow running a desktop on amdgpu with the
>>>>>> extra check enabled? If yes that would be quite a milestone.
>>>>>>
>>>>> Yes, It is running on amdgpu with extra config enabled.
>>>> In this case I will start pushing the patches to drm-misc-next. I'm just
>>>> going to leave out the last one until the IGT tests are working as well.
>>> ffs Christian. intel CI blew up yet again:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FCI_DRM_12146%2Fshard-glk7%2Figt%40kms_plane_lowres%40tiling-y%40pipe-c-hdmi-a-2.html&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C31a4fd82204b4eada97708da9a31d922%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637991836142423547%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=TqPiX483fF%2FUdZHTjle8k5XplcF3DVaZBs0IzQlNYck%3D&amp;reserved=0
>>>
>>> The last time (some ttm thing) was just a week or two ago,
>>> so it's really getting tiresome watching you push entirely
>>> untested stuff all the time. Would be really helpful if you
>>> finally started to do/require premerge testing.
>> Well first of all sorry for causing trouble, but as I wrote above I
>> intentionally left out the last one to *not* break the IGT tests.
>>
>> The patches pushed so far where just updating a bunch of corner cases
>> and fixing the selftests.
>>
>> Do you have any more insight why that should affect the IGT tests?
> I have no idea. You have the oopses from pstore right there.
> Did you even look at them?

Ah! Sorry, I didn't see that there were additional links to the oopses. 
Yeah, the problem is obvious with them.

The check for the signaled bit comes before grabbing the lock. This only 
worked before because of the __dma_fence_enable_sw_signaling() shortcut.

Going to send a fix for this in a minute.

Thanks,
Christian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug
@ 2022-09-19 11:38               ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-19 11:38 UTC (permalink / raw)
  To: Ville Syrjälä, Christian König
  Cc: Arunpravin.PaneerSelvam, airlied, gustavo, amaranath.somalapuram,
	linux-kernel, Arvind Yadav, linaro-mm-sig, dri-devel, Yadav,
	Arvind, shashank.sharma, sumit.semwal, linux-media

Am 19.09.22 um 13:26 schrieb Ville Syrjälä:
> On Sat, Sep 17, 2022 at 05:18:40PM +0200, Christian König wrote:
>> Am 17.09.22 um 08:17 schrieb Ville Syrjälä:
>>> On Thu, Sep 15, 2022 at 06:05:30PM +0200, Christian König wrote:
>>>> Am 15.09.22 um 15:02 schrieb Yadav, Arvind:
>>>>> On 9/15/2022 5:37 PM, Christian König wrote:
>>>>>> Is that sufficient to allow running a desktop on amdgpu with the
>>>>>> extra check enabled? If yes that would be quite a milestone.
>>>>>>
>>>>> Yes, It is running on amdgpu with extra config enabled.
>>>> In this case I will start pushing the patches to drm-misc-next. I'm just
>>>> going to leave out the last one until the IGT tests are working as well.
>>> ffs Christian. intel CI blew up yet again:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FCI_DRM_12146%2Fshard-glk7%2Figt%40kms_plane_lowres%40tiling-y%40pipe-c-hdmi-a-2.html&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C31a4fd82204b4eada97708da9a31d922%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637991836142423547%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=TqPiX483fF%2FUdZHTjle8k5XplcF3DVaZBs0IzQlNYck%3D&amp;reserved=0
>>>
>>> The last time (some ttm thing) was just a week or two ago,
>>> so it's really getting tiresome watching you push entirely
>>> untested stuff all the time. Would be really helpful if you
>>> finally started to do/require premerge testing.
>> Well first of all sorry for causing trouble, but as I wrote above I
>> intentionally left out the last one to *not* break the IGT tests.
>>
>> The patches pushed so far where just updating a bunch of corner cases
>> and fixing the selftests.
>>
>> Do you have any more insight why that should affect the IGT tests?
> I have no idea. You have the oopses from pstore right there.
> Did you even look at them?

Ah! Sorry, I didn't see that there were additional links to the oopses. 
Yeah, the problem is obvious with them.

The check for the signaled bit comes before grabbing the lock. This only 
worked before because of the __dma_fence_enable_sw_signaling() shortcut.

Going to send a fix for this in a minute.

Thanks,
Christian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  2022-09-14 16:43 ` [PATCH v4 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
@ 2022-09-29 14:53   ` Steven Price
  2022-09-29 14:57     ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Price @ 2022-09-29 14:53 UTC (permalink / raw)
  To: Arvind Yadav, Christian.Koenig, andrey.grodzovsky,
	shashank.sharma, amaranath.somalapuram, Arunpravin.PaneerSelvam,
	sumit.semwal, gustavo, airlied, daniel, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel

On 14/09/2022 17:43, Arvind Yadav wrote:
> Using the parent fence instead of the finished fence
> to get the job status. This change is to avoid GPU
> scheduler timeout error which can cause GPU reset.

I'm able to reproduce crashes on Panfrost and I believe this commit is
the cause. Specifically it's possible for job->s_fence->parent to be NULL.

The underlying issue seems to involve drm_sched_resubmit_jobs_ext() - if
the run_jobs() callback returns an error it will set s_fence->parent to
NULL after signalling s_fence->finished:

> 		fence = sched->ops->run_job(s_job);
> 		i++;
> 
> 		if (IS_ERR_OR_NULL(fence)) {
> 			if (IS_ERR(fence))
> 				dma_fence_set_error(&s_fence->finished, PTR_ERR(fence));
> 
> 			s_job->s_fence->parent = NULL;

I don't understand the reasoning behind this change, but it doesn't seem
right to be using the parent fence when we have code which can be
setting that pointer to NULL.

Since I don't understand the reasoning my only suggestion is to revert
this patch (and potentially the dependent patch "dma-buf: Check status
of enable-signaling bit on debug"?).

Can anyone suggest a better fix?

Thanks,

Steve

> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
> 
> changes in v1,v2 - Enable signaling for finished fence in sche_main()
> is removed
> 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e0ab14e0fb6b..2ac28ad11432 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  	job = list_first_entry_or_null(&sched->pending_list,
>  				       struct drm_sched_job, list);
>  
> -	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> +	if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>  		/* remove job from pending_list */
>  		list_del_init(&job->list);
>  
> @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  
>  		if (next) {
>  			next->s_fence->scheduled.timestamp =
> -				job->s_fence->finished.timestamp;
> +				job->s_fence->parent->timestamp;
>  			/* start TO timer for next job */
>  			drm_sched_start_timeout(sched);
>  		}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  2022-09-29 14:53   ` Steven Price
@ 2022-09-29 14:57     ` Christian König
  2022-09-29 15:31       ` Steven Price
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-09-29 14:57 UTC (permalink / raw)
  To: Steven Price, Arvind Yadav, andrey.grodzovsky, shashank.sharma,
	amaranath.somalapuram, Arunpravin.PaneerSelvam, sumit.semwal,
	gustavo, airlied, daniel, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel

Am 29.09.22 um 16:53 schrieb Steven Price:
> On 14/09/2022 17:43, Arvind Yadav wrote:
>> Using the parent fence instead of the finished fence
>> to get the job status. This change is to avoid GPU
>> scheduler timeout error which can cause GPU reset.
> I'm able to reproduce crashes on Panfrost and I believe this commit is
> the cause. Specifically it's possible for job->s_fence->parent to be NULL.
>
> The underlying issue seems to involve drm_sched_resubmit_jobs_ext() - if
> the run_jobs() callback returns an error it will set s_fence->parent to
> NULL after signalling s_fence->finished:
>
>> 		fence = sched->ops->run_job(s_job);
>> 		i++;
>>
>> 		if (IS_ERR_OR_NULL(fence)) {
>> 			if (IS_ERR(fence))
>> 				dma_fence_set_error(&s_fence->finished, PTR_ERR(fence));
>>
>> 			s_job->s_fence->parent = NULL;
> I don't understand the reasoning behind this change, but it doesn't seem
> right to be using the parent fence when we have code which can be
> setting that pointer to NULL.
>
> Since I don't understand the reasoning my only suggestion is to revert
> this patch (and potentially the dependent patch "dma-buf: Check status
> of enable-signaling bit on debug"?).
>
> Can anyone suggest a better fix?

Well, first of all please absolutely don't use 
drm_sched_resubmit_jobs_ext()!

It was an extremely bad idea in amdgpu to approach GPU by re-submitting 
jobs and it was an even worse idea to push this into the scheduler.

The design of dma_fence is that you submit that once and *only* once and 
then get a result for this submission. If re-submission is desirable it 
should be done in userspace or at least higher levels.

Apart from that, yes a NULL check is missing here but that should be 
trivial to fix.

Thanks,
Christian.

>
> Thanks,
>
> Steve
>
>> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
>> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>
>> changes in v1,v2 - Enable signaling for finished fence in sche_main()
>> is removed
>>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index e0ab14e0fb6b..2ac28ad11432 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   	job = list_first_entry_or_null(&sched->pending_list,
>>   				       struct drm_sched_job, list);
>>   
>> -	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>> +	if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>>   		/* remove job from pending_list */
>>   		list_del_init(&job->list);
>>   
>> @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   
>>   		if (next) {
>>   			next->s_fence->scheduled.timestamp =
>> -				job->s_fence->finished.timestamp;
>> +				job->s_fence->parent->timestamp;
>>   			/* start TO timer for next job */
>>   			drm_sched_start_timeout(sched);
>>   		}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  2022-09-29 14:57     ` Christian König
@ 2022-09-29 15:31       ` Steven Price
  2022-09-29 17:07         ` [Linaro-mm-sig] " Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Price @ 2022-09-29 15:31 UTC (permalink / raw)
  To: Christian König, Arvind Yadav, andrey.grodzovsky,
	shashank.sharma, amaranath.somalapuram, Arunpravin.PaneerSelvam,
	sumit.semwal, gustavo, airlied, daniel, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel

On 29/09/2022 15:57, Christian König wrote:
> Am 29.09.22 um 16:53 schrieb Steven Price:
>> On 14/09/2022 17:43, Arvind Yadav wrote:
>>> Using the parent fence instead of the finished fence
>>> to get the job status. This change is to avoid GPU
>>> scheduler timeout error which can cause GPU reset.
>> I'm able to reproduce crashes on Panfrost and I believe this commit is
>> the cause. Specifically it's possible for job->s_fence->parent to be
>> NULL.
>>
>> The underlying issue seems to involve drm_sched_resubmit_jobs_ext() - if
>> the run_jobs() callback returns an error it will set s_fence->parent to
>> NULL after signalling s_fence->finished:
>>
>>>         fence = sched->ops->run_job(s_job);
>>>         i++;
>>>
>>>         if (IS_ERR_OR_NULL(fence)) {
>>>             if (IS_ERR(fence))
>>>                 dma_fence_set_error(&s_fence->finished, PTR_ERR(fence));
>>>
>>>             s_job->s_fence->parent = NULL;
>> I don't understand the reasoning behind this change, but it doesn't seem
>> right to be using the parent fence when we have code which can be
>> setting that pointer to NULL.
>>
>> Since I don't understand the reasoning my only suggestion is to revert
>> this patch (and potentially the dependent patch "dma-buf: Check status
>> of enable-signaling bit on debug"?).
>>
>> Can anyone suggest a better fix?
> 
> Well, first of all please absolutely don't use
> drm_sched_resubmit_jobs_ext()!

Panfrost isn't using drm_sched_resubmit_jobs_ext() directly but via
drm_sched_resubmit_jobs().

> It was an extremely bad idea in amdgpu to approach GPU by re-submitting
> jobs and it was an even worse idea to push this into the scheduler.
> 
> The design of dma_fence is that you submit that once and *only* once and
> then get a result for this submission. If re-submission is desirable it
> should be done in userspace or at least higher levels.

Panfrost has an interesting feature where it's possible to rescue a job
during a GPU reset. Because jobs are queued on the GPU if the job hasn't
actually started executing then it's quite possible to safely resubmit
it from the kernel driver and user space doesn't need to be involved.

The benefit of this is if another process has hung the GPU that
processes jobs can be killed off without affecting any other innocent
processes.

One option would be to hide all this from the scheduler, but I can't see
how to do that without also hiding the actual reset from the scheduler.
Admittedly at the moment Panfrost is far too aggressive at resetting and
will perform a GPU reset in conditions where it's completely
unnecessary. There's work to do there but I haven't had the time to look
at it yet.

> Apart from that, yes a NULL check is missing here but that should be
> trivial to fix.

What I'm struggling to get my head round is whether it's correct to
always treat the job as signalled just because s_fence->parent is NULL?

Thanks,

Steve

> Thanks,
> Christian.
> 
>>
>> Thanks,
>>
>> Steve
>>
>>> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
>>> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>
>>> changes in v1,v2 - Enable signaling for finished fence in sche_main()
>>> is removed
>>>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index e0ab14e0fb6b..2ac28ad11432 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct
>>> drm_gpu_scheduler *sched)
>>>       job = list_first_entry_or_null(&sched->pending_list,
>>>                          struct drm_sched_job, list);
>>>   -    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>> +    if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>>>           /* remove job from pending_list */
>>>           list_del_init(&job->list);
>>>   @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct
>>> drm_gpu_scheduler *sched)
>>>             if (next) {
>>>               next->s_fence->scheduled.timestamp =
>>> -                job->s_fence->finished.timestamp;
>>> +                job->s_fence->parent->timestamp;
>>>               /* start TO timer for next job */
>>>               drm_sched_start_timeout(sched);
>>>           }
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH v4 5/6] drm/sched: Use parent fence instead of finished
  2022-09-29 15:31       ` Steven Price
@ 2022-09-29 17:07         ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-29 17:07 UTC (permalink / raw)
  To: Steven Price, Christian König, Arvind Yadav,
	andrey.grodzovsky, shashank.sharma, amaranath.somalapuram,
	Arunpravin.PaneerSelvam, sumit.semwal, gustavo, airlied, daniel,
	linux-media, dri-devel, linaro-mm-sig, linux-kernel

Am 29.09.22 um 17:31 schrieb Steven Price:
> On 29/09/2022 15:57, Christian König wrote:
>> Am 29.09.22 um 16:53 schrieb Steven Price:
>>> On 14/09/2022 17:43, Arvind Yadav wrote:
>>>> Using the parent fence instead of the finished fence
>>>> to get the job status. This change is to avoid GPU
>>>> scheduler timeout error which can cause GPU reset.
>>> I'm able to reproduce crashes on Panfrost and I believe this commit is
>>> the cause. Specifically it's possible for job->s_fence->parent to be
>>> NULL.
>>>
>>> The underlying issue seems to involve drm_sched_resubmit_jobs_ext() - if
>>> the run_jobs() callback returns an error it will set s_fence->parent to
>>> NULL after signalling s_fence->finished:
>>>
>>>>          fence = sched->ops->run_job(s_job);
>>>>          i++;
>>>>
>>>>          if (IS_ERR_OR_NULL(fence)) {
>>>>              if (IS_ERR(fence))
>>>>                  dma_fence_set_error(&s_fence->finished, PTR_ERR(fence));
>>>>
>>>>              s_job->s_fence->parent = NULL;
>>> I don't understand the reasoning behind this change, but it doesn't seem
>>> right to be using the parent fence when we have code which can be
>>> setting that pointer to NULL.
>>>
>>> Since I don't understand the reasoning my only suggestion is to revert
>>> this patch (and potentially the dependent patch "dma-buf: Check status
>>> of enable-signaling bit on debug"?).
>>>
>>> Can anyone suggest a better fix?
>> Well, first of all please absolutely don't use
>> drm_sched_resubmit_jobs_ext()!
> Panfrost isn't using drm_sched_resubmit_jobs_ext() directly but via
> drm_sched_resubmit_jobs().

Yeah, but it's the same problem that this isn't designed very well.

>> It was an extremely bad idea in amdgpu to approach GPU by re-submitting
>> jobs and it was an even worse idea to push this into the scheduler.
>>
>> The design of dma_fence is that you submit that once and *only* once and
>> then get a result for this submission. If re-submission is desirable it
>> should be done in userspace or at least higher levels.
> Panfrost has an interesting feature where it's possible to rescue a job
> during a GPU reset. Because jobs are queued on the GPU if the job hasn't
> actually started executing then it's quite possible to safely resubmit
> it from the kernel driver and user space doesn't need to be involved.

That's actually fine. E.g. when you can save the hardware state and 
restart it there is nothing as far as I can see which speaks against that.

The problem is rather pushing this into the scheduler because and trying 
to fit the square pig through a round hole.

You either end up allocating memory while inside a GPU reset (which is 
illegal because allocating memory could need to wait for the reset to 
finish). Or you end up re-using the same dma_fence object twice (which 
in turn is illegal from the dma_fence design).

> The benefit of this is if another process has hung the GPU that
> processes jobs can be killed off without affecting any other innocent
> processes.
>
> One option would be to hide all this from the scheduler, but I can't see
> how to do that without also hiding the actual reset from the scheduler.
> Admittedly at the moment Panfrost is far too aggressive at resetting and
> will perform a GPU reset in conditions where it's completely
> unnecessary. There's work to do there but I haven't had the time to look
> at it yet.
>
>> Apart from that, yes a NULL check is missing here but that should be
>> trivial to fix.
> What I'm struggling to get my head round is whether it's correct to
> always treat the job as signalled just because s_fence->parent is NULL?

Well s_fence parent will never be set to something else than NULL in 
this situation, isn't it?

The problem with using the finished fence is that this is actually the 
public interface of the scheduler instead of the internal state.

In other words s_fence->parent is what the scheduler deals with to 
produce the s_fence->finished state.

Christian.

>
> Thanks,
>
> Steve
>
>> Thanks,
>> Christian.
>>
>>> Thanks,
>>>
>>> Steve
>>>
>>>> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
>>>> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>
>>>> changes in v1,v2 - Enable signaling for finished fence in sche_main()
>>>> is removed
>>>>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index e0ab14e0fb6b..2ac28ad11432 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -829,7 +829,7 @@ drm_sched_get_cleanup_job(struct
>>>> drm_gpu_scheduler *sched)
>>>>        job = list_first_entry_or_null(&sched->pending_list,
>>>>                           struct drm_sched_job, list);
>>>>    -    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>> +    if (job && dma_fence_is_signaled(job->s_fence->parent)) {
>>>>            /* remove job from pending_list */
>>>>            list_del_init(&job->list);
>>>>    @@ -841,7 +841,7 @@ drm_sched_get_cleanup_job(struct
>>>> drm_gpu_scheduler *sched)
>>>>              if (next) {
>>>>                next->s_fence->scheduled.timestamp =
>>>> -                job->s_fence->finished.timestamp;
>>>> +                job->s_fence->parent->timestamp;
>>>>                /* start TO timer for next job */
>>>>                drm_sched_start_timeout(sched);
>>>>            }
> _______________________________________________
> Linaro-mm-sig mailing list -- linaro-mm-sig@lists.linaro.org
> To unsubscribe send an email to linaro-mm-sig-leave@lists.linaro.org


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-09-29 17:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14 16:43 [PATCH v4 0/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-14 16:43 ` [PATCH v4 1/6] dma-buf: Remove the signaled bit status check Arvind Yadav
2022-09-14 16:43   ` Arvind Yadav
2022-09-14 16:43 ` [PATCH v4 2/6] dma-buf: set signaling bit for the stub fence Arvind Yadav
2022-09-14 16:43   ` Arvind Yadav
2022-09-14 16:43 ` [PATCH v4 3/6] dma-buf: Enable signaling on fence for selftests Arvind Yadav
2022-09-14 16:43   ` Arvind Yadav
2022-09-14 16:43 ` [PATCH v4 4/6] dma-buf: dma_fence_wait must enable signaling Arvind Yadav
2022-09-15 12:06   ` Christian König
2022-09-14 16:43 ` [PATCH v4 5/6] drm/sched: Use parent fence instead of finished Arvind Yadav
2022-09-29 14:53   ` Steven Price
2022-09-29 14:57     ` Christian König
2022-09-29 15:31       ` Steven Price
2022-09-29 17:07         ` [Linaro-mm-sig] " Christian König
2022-09-14 16:43 ` [PATCH v4 6/6] dma-buf: Check status of enable-signaling bit on debug Arvind Yadav
2022-09-15 12:06   ` Christian König
2022-09-15 12:07 ` [PATCH v4 0/6] " Christian König
2022-09-15 13:02   ` Yadav, Arvind
2022-09-15 16:05     ` Christian König
2022-09-17  6:17       ` Ville Syrjälä
2022-09-17  6:17         ` Ville Syrjälä
2022-09-17 15:18         ` [Linaro-mm-sig] " Christian König
2022-09-17 15:18           ` Christian König
2022-09-19 11:26           ` Ville Syrjälä
2022-09-19 11:26             ` Ville Syrjälä
2022-09-19 11:38             ` Christian König
2022-09-19 11:38               ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.