[PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian.Koenig-5C7GfCeVMHo

This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.

We will do a proper fix in next patch.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
 			continue;
 		}
 
-		for (i = 0; i < num_entities; i++) {
-			mutex_lock(&ctx->adev->lock_reset);
+		for (i = 0; i < num_entities; i++)
 			drm_sched_entity_fini(&ctx->entities[0][i].entity);
-			mutex_unlock(&ctx->adev->lock_reset);
-		}
 	}
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.

We will do a proper fix in next patch.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
 			continue;
 		}
 
-		for (i = 0; i < num_entities; i++) {
-			mutex_lock(&ctx->adev->lock_reset);
+		for (i = 0; i < num_entities; i++)
 			drm_sched_entity_fini(&ctx->entities[0][i].entity);
-			mutex_unlock(&ctx->adev->lock_reset);
-		}
 	}
 }
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig

This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.

We will do a proper fix in next patch.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
 			continue;
 		}
 
-		for (i = 0; i < num_entities; i++) {
-			mutex_lock(&ctx->adev->lock_reset);
+		for (i = 0; i < num_entities; i++)
 			drm_sched_entity_fini(&ctx->entities[0][i].entity);
-			mutex_unlock(&ctx->adev->lock_reset);
-		}
 	}
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
 drivers/gpu/drm/scheduler/sched_main.c   |  6 ++++++
 include/drm/gpu_scheduler.h              |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
 
 #include <linux/kthread.h>
 #include <linux/slab.h>
+#include <linux/completion.h>
 
 #include <drm/drm_print.h>
 #include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	if (!entity->rq_list)
 		return -ENOMEM;
 
+	init_completion(&entity->entity_idle);
+
 	for (i = 0; i < num_rq_list; ++i)
 		entity->rq_list[i] = rq_list[i];
 
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
 	 */
 	if (spsc_queue_count(&entity->job_queue)) {
 		if (sched) {
-			/* Park the kernel for a moment to make sure it isn't processing
-			 * our enity.
+			/*
+			 * Wait for thread to idle to make sure it isn't processing
+			 * this entity.
 			 */
-			kthread_park(sched->thread);
-			kthread_unpark(sched->thread);
+			wait_for_completion(&entity->entity_idle);
+
 		}
 		if (entity->dependency) {
 			dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
+#include <linux/completion.h>
 #include <uapi/linux/sched/types.h>
 
 #include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
 				rq->current_entity = entity;
+				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
 				return entity;
 			}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 
 		if (drm_sched_entity_is_ready(entity)) {
 			rq->current_entity = entity;
+			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
 			return entity;
 		}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
 			continue;
 
 		sched_job = drm_sched_entity_pop_job(entity);
+
+		complete(&entity->entity_idle);
+
 		if (!sched_job)
 			continue;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
  * @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
 	struct dma_fence                *last_scheduled;
 	struct task_struct		*last_user;
 	bool 				stopped;
+	struct completion		entity_idle;
 };
 
 /**
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
 drivers/gpu/drm/scheduler/sched_main.c   |  6 ++++++
 include/drm/gpu_scheduler.h              |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
 
 #include <linux/kthread.h>
 #include <linux/slab.h>
+#include <linux/completion.h>
 
 #include <drm/drm_print.h>
 #include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	if (!entity->rq_list)
 		return -ENOMEM;
 
+	init_completion(&entity->entity_idle);
+
 	for (i = 0; i < num_rq_list; ++i)
 		entity->rq_list[i] = rq_list[i];
 
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
 	 */
 	if (spsc_queue_count(&entity->job_queue)) {
 		if (sched) {
-			/* Park the kernel for a moment to make sure it isn't processing
-			 * our enity.
+			/*
+			 * Wait for thread to idle to make sure it isn't processing
+			 * this entity.
 			 */
-			kthread_park(sched->thread);
-			kthread_unpark(sched->thread);
+			wait_for_completion(&entity->entity_idle);
+
 		}
 		if (entity->dependency) {
 			dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
+#include <linux/completion.h>
 #include <uapi/linux/sched/types.h>
 
 #include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
 				rq->current_entity = entity;
+				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
 				return entity;
 			}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 
 		if (drm_sched_entity_is_ready(entity)) {
 			rq->current_entity = entity;
+			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
 			return entity;
 		}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
 			continue;
 
 		sched_job = drm_sched_entity_pop_job(entity);
+
+		complete(&entity->entity_idle);
+
 		if (!sched_job)
 			continue;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
  * @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
 	struct dma_fence                *last_scheduled;
 	struct task_struct		*last_user;
 	bool 				stopped;
+	struct completion		entity_idle;
 };
 
 /**
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
 drivers/gpu/drm/scheduler/sched_main.c   |  6 ++++++
 include/drm/gpu_scheduler.h              |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
 
 #include <linux/kthread.h>
 #include <linux/slab.h>
+#include <linux/completion.h>
 
 #include <drm/drm_print.h>
 #include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	if (!entity->rq_list)
 		return -ENOMEM;
 
+	init_completion(&entity->entity_idle);
+
 	for (i = 0; i < num_rq_list; ++i)
 		entity->rq_list[i] = rq_list[i];
 
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
 	 */
 	if (spsc_queue_count(&entity->job_queue)) {
 		if (sched) {
-			/* Park the kernel for a moment to make sure it isn't processing
-			 * our enity.
+			/*
+			 * Wait for thread to idle to make sure it isn't processing
+			 * this entity.
 			 */
-			kthread_park(sched->thread);
-			kthread_unpark(sched->thread);
+			wait_for_completion(&entity->entity_idle);
+
 		}
 		if (entity->dependency) {
 			dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
+#include <linux/completion.h>
 #include <uapi/linux/sched/types.h>
 
 #include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
 				rq->current_entity = entity;
+				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
 				return entity;
 			}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
 
 		if (drm_sched_entity_is_ready(entity)) {
 			rq->current_entity = entity;
+			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
 			return entity;
 		}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
 			continue;
 
 		sched_job = drm_sched_entity_pop_job(entity);
+
+		complete(&entity->entity_idle);
+
 		if (!sched_job)
 			continue;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
  * @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
 	struct dma_fence                *last_scheduled;
 	struct task_struct		*last_user;
 	bool 				stopped;
+	struct completion		entity_idle;
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51     ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian.Koenig-5C7GfCeVMHo

When the sched thread is parked we assume ring_mirror_list is
not accessed from here.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 {
 	unsigned long flags;
 
-	/* Don't destroy jobs while the timeout worker is running */
-	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !cancel_delayed_work(&sched->work_tdr))
+	/*
+	 * Don't destroy jobs while the timeout worker is running  OR thread
+	 * is being parked and hence assumed to not touch ring_mirror_list
+	 */
+	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !cancel_delayed_work(&sched->work_tdr)) ||
+	    __kthread_should_park(sched->thread))
 		return;
 
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51     ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

When the sched thread is parked we assume ring_mirror_list is
not accessed from here.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 {
 	unsigned long flags;
 
-	/* Don't destroy jobs while the timeout worker is running */
-	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !cancel_delayed_work(&sched->work_tdr))
+	/*
+	 * Don't destroy jobs while the timeout worker is running  OR thread
+	 * is being parked and hence assumed to not touch ring_mirror_list
+	 */
+	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !cancel_delayed_work(&sched->work_tdr)) ||
+	    __kthread_should_park(sched->thread))
 		return;
 
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51     ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig

When the sched thread is parked we assume ring_mirror_list is
not accessed from here.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 {
 	unsigned long flags;
 
-	/* Don't destroy jobs while the timeout worker is running */
-	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !cancel_delayed_work(&sched->work_tdr))
+	/*
+	 * Don't destroy jobs while the timeout worker is running  OR thread
+	 * is being parked and hence assumed to not touch ring_mirror_list
+	 */
+	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !cancel_delayed_work(&sched->work_tdr)) ||
+	    __kthread_should_park(sched->thread))
 		return;
 
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.

Fix:
Serialize with GPU reset.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 	struct amdgpu_device *adev = dev->dev_private;
 	int r = 0, i;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* hold on the scheduler */
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 		kthread_unpark(ring->sched.thread);
 	}
 
+	mutex_unlock(&adev->lock_reset);
+
 	return 0;
 }
 
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	if (!fences)
 		return -ENOMEM;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* stop the scheduler */
 	kthread_park(ring->sched.thread);
 
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	/* restart the scheduler */
 	kthread_unpark(ring->sched.thread);
 
+	mutex_unlock(&adev->lock_reset);
+
 	ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
 
 	kfree(fences);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.

Fix:
Serialize with GPU reset.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 	struct amdgpu_device *adev = dev->dev_private;
 	int r = 0, i;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* hold on the scheduler */
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 		kthread_unpark(ring->sched.thread);
 	}
 
+	mutex_unlock(&adev->lock_reset);
+
 	return 0;
 }
 
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	if (!fences)
 		return -ENOMEM;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* stop the scheduler */
 	kthread_park(ring->sched.thread);
 
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	/* restart the scheduler */
 	kthread_unpark(ring->sched.thread);
 
+	mutex_unlock(&adev->lock_reset);
+
 	ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
 
 	kfree(fences);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51   ` Andrey Grodzovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
  Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig

Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.

Fix:
Serialize with GPU reset.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 	struct amdgpu_device *adev = dev->dev_private;
 	int r = 0, i;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* hold on the scheduler */
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
 		kthread_unpark(ring->sched.thread);
 	}
 
+	mutex_unlock(&adev->lock_reset);
+
 	return 0;
 }
 
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	if (!fences)
 		return -ENOMEM;
 
+	/* Avoid accidently unparking the sched thread during GPU reset */
+	mutex_lock(&adev->lock_reset);
+
 	/* stop the scheduler */
 	kthread_park(ring->sched.thread);
 
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	/* restart the scheduler */
 	kthread_unpark(ring->sched.thread);
 
+	mutex_unlock(&adev->lock_reset);
+
 	ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
 
 	kfree(fences);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07  9:24     ` Koenig, Christian
  0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07  9:24 UTC (permalink / raw)
  To: Grodzovsky, Andrey
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, S, Shirish

Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

The order of this one and patch #2 needs to be swapped, or otherwise we 
have the bug in between those two commits again.

Apart from that the series is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>   			continue;
>   		}
>   
> -		for (i = 0; i < num_entities; i++) {
> -			mutex_lock(&ctx->adev->lock_reset);
> +		for (i = 0; i < num_entities; i++)
>   			drm_sched_entity_fini(&ctx->entities[0][i].entity);
> -			mutex_unlock(&ctx->adev->lock_reset);
> -		}
>   	}
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07  9:24     ` Koenig, Christian
  0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07  9:24 UTC (permalink / raw)
  To: Grodzovsky, Andrey; +Cc: amd-gfx, dri-devel, S, Shirish

Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

The order of this one and patch #2 needs to be swapped, or otherwise we 
have the bug in between those two commits again.

Apart from that the series is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>   			continue;
>   		}
>   
> -		for (i = 0; i < num_entities; i++) {
> -			mutex_lock(&ctx->adev->lock_reset);
> +		for (i = 0; i < num_entities; i++)
>   			drm_sched_entity_fini(&ctx->entities[0][i].entity);
> -			mutex_unlock(&ctx->adev->lock_reset);
> -		}
>   	}
>   }
>   

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07  9:24     ` Koenig, Christian
  0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07  9:24 UTC (permalink / raw)
  To: Grodzovsky, Andrey; +Cc: amd-gfx, dri-devel, S, Shirish

Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

The order of this one and patch #2 needs to be swapped, or otherwise we 
have the bug in between those two commits again.

Apart from that the series is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>   			continue;
>   		}
>   
> -		for (i = 0; i < num_entities; i++) {
> -			mutex_lock(&ctx->adev->lock_reset);
> +		for (i = 0; i < num_entities; i++)
>   			drm_sched_entity_fini(&ctx->entities[0][i].entity);
> -			mutex_unlock(&ctx->adev->lock_reset);
> -		}
>   	}
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-11-07  9:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-06 17:51 [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset" Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` [PATCH 2/4] drm/sched: Use completion to wait for entity idle Andrey Grodzovsky
2019-11-06 17:51   ` Andrey Grodzovsky
2019-11-06 17:51   ` Andrey Grodzovsky
2019-11-06 17:51 ` [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation Andrey Grodzovsky
2019-11-06 17:51   ` Andrey Grodzovsky
2019-11-06 17:51   ` Andrey Grodzovsky
     [not found] ` <1573062672-23698-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-11-06 17:51   ` [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked Andrey Grodzovsky
2019-11-06 17:51     ` Andrey Grodzovsky
2019-11-06 17:51     ` Andrey Grodzovsky
2019-11-07  9:24   ` [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset" Koenig, Christian
2019-11-07  9:24     ` Koenig, Christian
2019-11-07  9:24     ` Koenig, Christian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.