* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S-5C7GfCeVMHo,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Christian.Koenig-5C7GfCeVMHo
This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
We will do a proper fix in next patch.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
continue;
}
- for (i = 0; i < num_entities; i++) {
- mutex_lock(&ctx->adev->lock_reset);
+ for (i = 0; i < num_entities; i++)
drm_sched_entity_fini(&ctx->entities[0][i].entity);
- mutex_unlock(&ctx->adev->lock_reset);
- }
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
We will do a proper fix in next patch.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
continue;
}
- for (i = 0; i < num_entities; i++) {
- mutex_lock(&ctx->adev->lock_reset);
+ for (i = 0; i < num_entities; i++)
drm_sched_entity_fini(&ctx->entities[0][i].entity);
- mutex_unlock(&ctx->adev->lock_reset);
- }
}
}
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig
This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
We will do a proper fix in next patch.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 2cdaf3b..6614d8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
continue;
}
- for (i = 0; i < num_entities; i++) {
- mutex_lock(&ctx->adev->lock_reset);
+ for (i = 0; i < num_entities; i++)
drm_sched_entity_fini(&ctx->entities[0][i].entity);
- mutex_unlock(&ctx->adev->lock_reset);
- }
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
include/drm/gpu_scheduler.h | 2 ++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
#include <linux/kthread.h>
#include <linux/slab.h>
+#include <linux/completion.h>
#include <drm/drm_print.h>
#include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
if (!entity->rq_list)
return -ENOMEM;
+ init_completion(&entity->entity_idle);
+
for (i = 0; i < num_rq_list; ++i)
entity->rq_list[i] = rq_list[i];
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
*/
if (spsc_queue_count(&entity->job_queue)) {
if (sched) {
- /* Park the kernel for a moment to make sure it isn't processing
- * our enity.
+ /*
+ * Wait for thread to idle to make sure it isn't processing
+ * this entity.
*/
- kthread_park(sched->thread);
- kthread_unpark(sched->thread);
+ wait_for_completion(&entity->entity_idle);
+
}
if (entity->dependency) {
dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
#include <linux/kthread.h>
#include <linux/wait.h>
#include <linux/sched.h>
+#include <linux/completion.h>
#include <uapi/linux/sched/types.h>
#include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
list_for_each_entry_continue(entity, &rq->entities, list) {
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
continue;
sched_job = drm_sched_entity_pop_job(entity);
+
+ complete(&entity->entity_idle);
+
if (!sched_job)
continue;
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
* @last_scheduled: points to the finished fence of the last scheduled job.
* @last_user: last group leader pushing a job into the entity.
* @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
*
* Entities will emit jobs in order to their corresponding hardware
* ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
struct dma_fence *last_scheduled;
struct task_struct *last_user;
bool stopped;
+ struct completion entity_idle;
};
/**
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
include/drm/gpu_scheduler.h | 2 ++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
#include <linux/kthread.h>
#include <linux/slab.h>
+#include <linux/completion.h>
#include <drm/drm_print.h>
#include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
if (!entity->rq_list)
return -ENOMEM;
+ init_completion(&entity->entity_idle);
+
for (i = 0; i < num_rq_list; ++i)
entity->rq_list[i] = rq_list[i];
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
*/
if (spsc_queue_count(&entity->job_queue)) {
if (sched) {
- /* Park the kernel for a moment to make sure it isn't processing
- * our enity.
+ /*
+ * Wait for thread to idle to make sure it isn't processing
+ * this entity.
*/
- kthread_park(sched->thread);
- kthread_unpark(sched->thread);
+ wait_for_completion(&entity->entity_idle);
+
}
if (entity->dependency) {
dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
#include <linux/kthread.h>
#include <linux/wait.h>
#include <linux/sched.h>
+#include <linux/completion.h>
#include <uapi/linux/sched/types.h>
#include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
list_for_each_entry_continue(entity, &rq->entities, list) {
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
continue;
sched_job = drm_sched_entity_pop_job(entity);
+
+ complete(&entity->entity_idle);
+
if (!sched_job)
continue;
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
* @last_scheduled: points to the finished fence of the last scheduled job.
* @last_user: last group leader pushing a job into the entity.
* @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
*
* Entities will emit jobs in order to their corresponding hardware
* ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
struct dma_fence *last_scheduled;
struct task_struct *last_user;
bool stopped;
+ struct completion entity_idle;
};
/**
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] drm/sched: Use completion to wait for entity idle
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 12 ++++++++----
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
include/drm/gpu_scheduler.h | 2 ++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1a51531..461a7a8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -23,6 +23,7 @@
#include <linux/kthread.h>
#include <linux/slab.h>
+#include <linux/completion.h>
#include <drm/drm_print.h>
#include <drm/gpu_scheduler.h>
@@ -68,6 +69,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
if (!entity->rq_list)
return -ENOMEM;
+ init_completion(&entity->entity_idle);
+
for (i = 0; i < num_rq_list; ++i)
entity->rq_list[i] = rq_list[i];
@@ -286,11 +289,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
*/
if (spsc_queue_count(&entity->job_queue)) {
if (sched) {
- /* Park the kernel for a moment to make sure it isn't processing
- * our enity.
+ /*
+ * Wait for thread to idle to make sure it isn't processing
+ * this entity.
*/
- kthread_park(sched->thread);
- kthread_unpark(sched->thread);
+ wait_for_completion(&entity->entity_idle);
+
}
if (entity->dependency) {
dma_fence_remove_callback(entity->dependency,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index dba4390..38bbad7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -47,6 +47,7 @@
#include <linux/kthread.h>
#include <linux/wait.h>
#include <linux/sched.h>
+#include <linux/completion.h>
#include <uapi/linux/sched/types.h>
#include <drm/drm_print.h>
@@ -134,6 +135,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
list_for_each_entry_continue(entity, &rq->entities, list) {
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -144,6 +146,7 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
if (drm_sched_entity_is_ready(entity)) {
rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
spin_unlock(&rq->lock);
return entity;
}
@@ -721,6 +724,9 @@ static int drm_sched_main(void *param)
continue;
sched_job = drm_sched_entity_pop_job(entity);
+
+ complete(&entity->entity_idle);
+
if (!sched_job)
continue;
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 57b4121..6619d2a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,7 @@ enum drm_sched_priority {
* @last_scheduled: points to the finished fence of the last scheduled job.
* @last_user: last group leader pushing a job into the entity.
* @stopped: Marks the enity as removed from rq and destined for termination.
+ * @entity_idle: Signals when enityt is not in use
*
* Entities will emit jobs in order to their corresponding hardware
* ring, and the scheduler will alternate between entities based on
@@ -94,6 +95,7 @@ struct drm_sched_entity {
struct dma_fence *last_scheduled;
struct task_struct *last_user;
bool stopped;
+ struct completion entity_idle;
};
/**
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S-5C7GfCeVMHo,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Christian.Koenig-5C7GfCeVMHo
When the sched thread is parked we assume ring_mirror_list is
not accessed from here.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
{
unsigned long flags;
- /* Don't destroy jobs while the timeout worker is running */
- if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
- !cancel_delayed_work(&sched->work_tdr))
+ /*
+ * Don't destroy jobs while the timeout worker is running OR thread
+ * is being parked and hence assumed to not touch ring_mirror_list
+ */
+ if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+ !cancel_delayed_work(&sched->work_tdr)) ||
+ __kthread_should_park(sched->thread))
return;
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
When the sched thread is parked we assume ring_mirror_list is
not accessed from here.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
{
unsigned long flags;
- /* Don't destroy jobs while the timeout worker is running */
- if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
- !cancel_delayed_work(&sched->work_tdr))
+ /*
+ * Don't destroy jobs while the timeout worker is running OR thread
+ * is being parked and hence assumed to not touch ring_mirror_list
+ */
+ if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+ !cancel_delayed_work(&sched->work_tdr)) ||
+ __kthread_should_park(sched->thread))
return;
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig
When the sched thread is parked we assume ring_mirror_list is
not accessed from here.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 38bbad7..80ddbdf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -647,9 +647,13 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
{
unsigned long flags;
- /* Don't destroy jobs while the timeout worker is running */
- if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
- !cancel_delayed_work(&sched->work_tdr))
+ /*
+ * Don't destroy jobs while the timeout worker is running OR thread
+ * is being parked and hence assumed to not touch ring_mirror_list
+ */
+ if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+ !cancel_delayed_work(&sched->work_tdr)) ||
+ __kthread_should_park(sched->thread))
return;
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.
Fix:
Serialize with GPU reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
struct amdgpu_device *adev = dev->dev_private;
int r = 0, i;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* hold on the scheduler */
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
kthread_unpark(ring->sched.thread);
}
+ mutex_unlock(&adev->lock_reset);
+
return 0;
}
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
if (!fences)
return -ENOMEM;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* stop the scheduler */
kthread_park(ring->sched.thread);
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
/* restart the scheduler */
kthread_unpark(ring->sched.thread);
+ mutex_unlock(&adev->lock_reset);
+
ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
kfree(fences);
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.
Fix:
Serialize with GPU reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
struct amdgpu_device *adev = dev->dev_private;
int r = 0, i;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* hold on the scheduler */
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
kthread_unpark(ring->sched.thread);
}
+ mutex_unlock(&adev->lock_reset);
+
return 0;
}
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
if (!fences)
return -ENOMEM;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* stop the scheduler */
kthread_park(ring->sched.thread);
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
/* restart the scheduler */
kthread_unpark(ring->sched.thread);
+ mutex_unlock(&adev->lock_reset);
+
ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
kfree(fences);
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation.
@ 2019-11-06 17:51 ` Andrey Grodzovsky
0 siblings, 0 replies; 15+ messages in thread
From: Andrey Grodzovsky @ 2019-11-06 17:51 UTC (permalink / raw)
Cc: Andrey Grodzovsky, Shirish.S, amd-gfx, dri-devel, Christian.Koenig
Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.
Fix:
Serialize with GPU reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 693f17e..8e6726e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -859,6 +859,9 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
struct amdgpu_device *adev = dev->dev_private;
int r = 0, i;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* hold on the scheduler */
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -884,6 +887,8 @@ static int amdgpu_debugfs_test_ib(struct seq_file *m, void *data)
kthread_unpark(ring->sched.thread);
}
+ mutex_unlock(&adev->lock_reset);
+
return 0;
}
@@ -1036,6 +1041,9 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
if (!fences)
return -ENOMEM;
+ /* Avoid accidently unparking the sched thread during GPU reset */
+ mutex_lock(&adev->lock_reset);
+
/* stop the scheduler */
kthread_park(ring->sched.thread);
@@ -1075,6 +1083,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
/* restart the scheduler */
kthread_unpark(ring->sched.thread);
+ mutex_unlock(&adev->lock_reset);
+
ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
kfree(fences);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07 9:24 ` Koenig, Christian
0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07 9:24 UTC (permalink / raw)
To: Grodzovsky, Andrey
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, S, Shirish
Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
The order of this one and patch #2 needs to be swapped, or otherwise we
have the bug in between those two commits again.
Apart from that the series is Reviewed-by: Christian König
<christian.koenig@amd.com>.
Thanks,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
> continue;
> }
>
> - for (i = 0; i < num_entities; i++) {
> - mutex_lock(&ctx->adev->lock_reset);
> + for (i = 0; i < num_entities; i++)
> drm_sched_entity_fini(&ctx->entities[0][i].entity);
> - mutex_unlock(&ctx->adev->lock_reset);
> - }
> }
> }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07 9:24 ` Koenig, Christian
0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07 9:24 UTC (permalink / raw)
To: Grodzovsky, Andrey; +Cc: amd-gfx, dri-devel, S, Shirish
Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
The order of this one and patch #2 needs to be swapped, or otherwise we
have the bug in between those two commits again.
Apart from that the series is Reviewed-by: Christian König
<christian.koenig@amd.com>.
Thanks,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
> continue;
> }
>
> - for (i = 0; i < num_entities; i++) {
> - mutex_lock(&ctx->adev->lock_reset);
> + for (i = 0; i < num_entities; i++)
> drm_sched_entity_fini(&ctx->entities[0][i].entity);
> - mutex_unlock(&ctx->adev->lock_reset);
> - }
> }
> }
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"
@ 2019-11-07 9:24 ` Koenig, Christian
0 siblings, 0 replies; 15+ messages in thread
From: Koenig, Christian @ 2019-11-07 9:24 UTC (permalink / raw)
To: Grodzovsky, Andrey; +Cc: amd-gfx, dri-devel, S, Shirish
Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
The order of this one and patch #2 needs to be swapped, or otherwise we
have the bug in between those two commits again.
Apart from that the series is Reviewed-by: Christian König
<christian.koenig@amd.com>.
Thanks,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
> continue;
> }
>
> - for (i = 0; i < num_entities; i++) {
> - mutex_lock(&ctx->adev->lock_reset);
> + for (i = 0; i < num_entities; i++)
> drm_sched_entity_fini(&ctx->entities[0][i].entity);
> - mutex_unlock(&ctx->adev->lock_reset);
> - }
> }
> }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-11-07 9:24 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-06 17:51 [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset" Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` [PATCH 2/4] drm/sched: Use completion to wait for entity idle Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` [PATCH 4/4] drm/amdgpu: Avoid accidental thread thread reactivation Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
[not found] ` <1573062672-23698-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-11-06 17:51 ` [PATCH 3/4] drm/sched: Avoid job cleanup if sched thread is parked Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-06 17:51 ` Andrey Grodzovsky
2019-11-07 9:24 ` [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset" Koenig, Christian
2019-11-07 9:24 ` Koenig, Christian
2019-11-07 9:24 ` Koenig, Christian
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.