dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
@ 2021-03-15  5:20 Jack Zhang
  2021-03-15  5:23 ` Zhang, Jack (Jian)
  0 siblings, 1 reply; 20+ messages in thread
From: Jack Zhang @ 2021-03-15  5:20 UTC (permalink / raw)
  To: dri-devel, amd-gfx, Christian.Koenig, Andrey.Grodzovsky,
	Monk.Liu, Emily.Deng
  Cc: Jack Zhang

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs
in case it hits the memleak issue.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
 drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
 drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
 include/drm/gpu_scheduler.h                | 1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 79b9cc73763f..86463b0f936e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 					job ? job->base.id : -1);
 
 		/* even we skipped this reset, still need to set the job to guilty */
-		if (job)
+		if (job) {
 			drm_sched_increase_karma(&job->base);
+			r = DRM_GPU_SCHED_STAT_BAILING;
+		}
 		goto skip_recovery;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 759b34799221..41390bdacd9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	struct amdgpu_job *job = to_amdgpu_job(s_job);
 	struct amdgpu_task_info ti;
 	struct amdgpu_device *adev = ring->adev;
+	int ret;
 
 	memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
@@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		ret = amdgpu_device_gpu_recover(ring->adev, job);
+		if (ret == DRM_GPU_SCHED_STAT_BAILING)
+			return DRM_GPU_SCHED_STAT_BAILING;
+		else
+			return DRM_GPU_SCHED_STAT_NOMINAL;
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
 		js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	/* Scheduler is already stopped, nothing to do. */
 	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	/* Schedule a reset if there's no reset in progress. */
 	if (!atomic_xchg(&pfdev->reset.pending, 1))
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
+	int ret;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		list_del_init(&job->list);
 		spin_unlock(&sched->job_list_lock);
 
-		job->sched->ops->timedout_job(job);
+		ret = job->sched->ops->timedout_job(job);
 
+		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
+			spin_lock(&sched->job_list_lock);
+			list_add(&job->node, &sched->ring_mirror_list);
+			spin_unlock(&sched->job_list_lock);
+		}
 		/*
 		 * Guilty job did complete and hence needs to be manually removed
 		 * See drm_sched_stop doc.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 4ea8606d91fe..8093ac2427ef 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
 	DRM_GPU_SCHED_STAT_NOMINAL,
 	DRM_GPU_SCHED_STAT_ENODEV,
+	DRM_GPU_SCHED_STAT_BAILING,
 };
 
 /**
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-15  5:20 [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak Jack Zhang
@ 2021-03-15  5:23 ` Zhang, Jack (Jian)
  2021-03-16  7:19   ` Zhang, Jack (Jian)
  2021-03-22 15:29   ` Steven Price
  0 siblings, 2 replies; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-15  5:23 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

[AMD Public Use]

Hi, Rob/Tomeu/Steven,

Would you please help to review this patch for panfrost driver?

Thanks,
Jack Zhang

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com> 
Sent: Monday, March 15, 2021 1:21 PM
To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs
in case it hits the memleak issue.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
 drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
 drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
 include/drm/gpu_scheduler.h                | 1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 79b9cc73763f..86463b0f936e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 					job ? job->base.id : -1);
 
 		/* even we skipped this reset, still need to set the job to guilty */
-		if (job)
+		if (job) {
 			drm_sched_increase_karma(&job->base);
+			r = DRM_GPU_SCHED_STAT_BAILING;
+		}
 		goto skip_recovery;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 759b34799221..41390bdacd9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	struct amdgpu_job *job = to_amdgpu_job(s_job);
 	struct amdgpu_task_info ti;
 	struct amdgpu_device *adev = ring->adev;
+	int ret;
 
 	memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
@@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		ret = amdgpu_device_gpu_recover(ring->adev, job);
+		if (ret == DRM_GPU_SCHED_STAT_BAILING)
+			return DRM_GPU_SCHED_STAT_BAILING;
+		else
+			return DRM_GPU_SCHED_STAT_NOMINAL;
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
 		js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	/* Scheduler is already stopped, nothing to do. */
 	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	/* Schedule a reset if there's no reset in progress. */
 	if (!atomic_xchg(&pfdev->reset.pending, 1))
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
+	int ret;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		list_del_init(&job->list);
 		spin_unlock(&sched->job_list_lock);
 
-		job->sched->ops->timedout_job(job);
+		ret = job->sched->ops->timedout_job(job);
 
+		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
+			spin_lock(&sched->job_list_lock);
+			list_add(&job->node, &sched->ring_mirror_list);
+			spin_unlock(&sched->job_list_lock);
+		}
 		/*
 		 * Guilty job did complete and hence needs to be manually removed
 		 * See drm_sched_stop doc.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 4ea8606d91fe..8093ac2427ef 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
 	DRM_GPU_SCHED_STAT_NOMINAL,
 	DRM_GPU_SCHED_STAT_ENODEV,
+	DRM_GPU_SCHED_STAT_BAILING,
 };
 
 /**
-- 
2.25.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-15  5:23 ` Zhang, Jack (Jian)
@ 2021-03-16  7:19   ` Zhang, Jack (Jian)
  2021-03-17  6:46     ` Zhang, Jack (Jian)
  2021-03-22 15:29   ` Steven Price
  1 sibling, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-16  7:19 UTC (permalink / raw)
  To: dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

[AMD Public Use]

Ping

-----Original Message-----
From: Zhang, Jack (Jian) 
Sent: Monday, March 15, 2021 1:24 PM
To: Jack Zhang <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

[AMD Public Use]

Hi, Rob/Tomeu/Steven,

Would you please help to review this patch for panfrost driver?

Thanks,
Jack Zhang

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com>
Sent: Monday, March 15, 2021 1:21 PM
To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs in case it hits the memleak issue.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
 drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
 drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
 include/drm/gpu_scheduler.h                | 1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 79b9cc73763f..86463b0f936e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 					job ? job->base.id : -1);
 
 		/* even we skipped this reset, still need to set the job to guilty */
-		if (job)
+		if (job) {
 			drm_sched_increase_karma(&job->base);
+			r = DRM_GPU_SCHED_STAT_BAILING;
+		}
 		goto skip_recovery;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 759b34799221..41390bdacd9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	struct amdgpu_job *job = to_amdgpu_job(s_job);
 	struct amdgpu_task_info ti;
 	struct amdgpu_device *adev = ring->adev;
+	int ret;
 
 	memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
@@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		ret = amdgpu_device_gpu_recover(ring->adev, job);
+		if (ret == DRM_GPU_SCHED_STAT_BAILING)
+			return DRM_GPU_SCHED_STAT_BAILING;
+		else
+			return DRM_GPU_SCHED_STAT_NOMINAL;
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
 		js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	/* Scheduler is already stopped, nothing to do. */
 	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	/* Schedule a reset if there's no reset in progress. */
 	if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)  {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
+	int ret;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		list_del_init(&job->list);
 		spin_unlock(&sched->job_list_lock);
 
-		job->sched->ops->timedout_job(job);
+		ret = job->sched->ops->timedout_job(job);
 
+		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
+			spin_lock(&sched->job_list_lock);
+			list_add(&job->node, &sched->ring_mirror_list);
+			spin_unlock(&sched->job_list_lock);
+		}
 		/*
 		 * Guilty job did complete and hence needs to be manually removed
 		 * See drm_sched_stop doc.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
 	DRM_GPU_SCHED_STAT_NOMINAL,
 	DRM_GPU_SCHED_STAT_ENODEV,
+	DRM_GPU_SCHED_STAT_BAILING,
 };
 
 /**
--
2.25.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-16  7:19   ` Zhang, Jack (Jian)
@ 2021-03-17  6:46     ` Zhang, Jack (Jian)
  2021-03-17  7:43       ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-17  6:46 UTC (permalink / raw)
  To: dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

Hi, Andrey/Crhistian and Team,

I didn't receive the reviewer's message from maintainers on panfrost driver for several days.
Due to this patch is urgent for my current working project.
Would you please help to give some review ideas?

Many Thanks,
Jack
-----Original Message-----
From: Zhang, Jack (Jian) 
Sent: Tuesday, March 16, 2021 3:20 PM
To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

[AMD Public Use]

Ping

-----Original Message-----
From: Zhang, Jack (Jian) 
Sent: Monday, March 15, 2021 1:24 PM
To: Jack Zhang <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

[AMD Public Use]

Hi, Rob/Tomeu/Steven,

Would you please help to review this patch for panfrost driver?

Thanks,
Jack Zhang

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com>
Sent: Monday, March 15, 2021 1:21 PM
To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs in case it hits the memleak issue.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
 drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
 drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
 include/drm/gpu_scheduler.h                | 1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 79b9cc73763f..86463b0f936e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 					job ? job->base.id : -1);
 
 		/* even we skipped this reset, still need to set the job to guilty */
-		if (job)
+		if (job) {
 			drm_sched_increase_karma(&job->base);
+			r = DRM_GPU_SCHED_STAT_BAILING;
+		}
 		goto skip_recovery;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 759b34799221..41390bdacd9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	struct amdgpu_job *job = to_amdgpu_job(s_job);
 	struct amdgpu_task_info ti;
 	struct amdgpu_device *adev = ring->adev;
+	int ret;
 
 	memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
@@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		ret = amdgpu_device_gpu_recover(ring->adev, job);
+		if (ret == DRM_GPU_SCHED_STAT_BAILING)
+			return DRM_GPU_SCHED_STAT_BAILING;
+		else
+			return DRM_GPU_SCHED_STAT_NOMINAL;
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(job->done_fence))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
 		js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
 
 	/* Scheduler is already stopped, nothing to do. */
 	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
-		return DRM_GPU_SCHED_STAT_NOMINAL;
+		return DRM_GPU_SCHED_STAT_BAILING;
 
 	/* Schedule a reset if there's no reset in progress. */
 	if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)  {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
+	int ret;
 
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		list_del_init(&job->list);
 		spin_unlock(&sched->job_list_lock);
 
-		job->sched->ops->timedout_job(job);
+		ret = job->sched->ops->timedout_job(job);
 
+		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
+			spin_lock(&sched->job_list_lock);
+			list_add(&job->node, &sched->ring_mirror_list);
+			spin_unlock(&sched->job_list_lock);
+		}
 		/*
 		 * Guilty job did complete and hence needs to be manually removed
 		 * See drm_sched_stop doc.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
 	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
 	DRM_GPU_SCHED_STAT_NOMINAL,
 	DRM_GPU_SCHED_STAT_ENODEV,
+	DRM_GPU_SCHED_STAT_BAILING,
 };
 
 /**
--
2.25.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-17  6:46     ` Zhang, Jack (Jian)
@ 2021-03-17  7:43       ` Christian König
  2021-03-17 14:50         ` Andrey Grodzovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2021-03-17  7:43 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

I was hoping Andrey would take a look since I'm really busy with other 
work right now.

Regards,
Christian.

Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
> Hi, Andrey/Crhistian and Team,
>
> I didn't receive the reviewer's message from maintainers on panfrost driver for several days.
> Due to this patch is urgent for my current working project.
> Would you please help to give some review ideas?
>
> Many Thanks,
> Jack
> -----Original Message-----
> From: Zhang, Jack (Jian)
> Sent: Tuesday, March 16, 2021 3:20 PM
> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> [AMD Public Use]
>
> Ping
>
> -----Original Message-----
> From: Zhang, Jack (Jian)
> Sent: Monday, March 15, 2021 1:24 PM
> To: Jack Zhang <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> [AMD Public Use]
>
> Hi, Rob/Tomeu/Steven,
>
> Would you please help to review this patch for panfrost driver?
>
> Thanks,
> Jack Zhang
>
> -----Original Message-----
> From: Jack Zhang <Jack.Zhang1@amd.com>
> Sent: Monday, March 15, 2021 1:21 PM
> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>
> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> re-insert Bailing jobs to avoid memory leak.
>
> V2: move re-insert step to drm/scheduler logic
> V3: add panfrost's return value for bailing jobs in case it hits the memleak issue.
>
> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>   include/drm/gpu_scheduler.h                | 1 +
>   5 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 79b9cc73763f..86463b0f936e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   					job ? job->base.id : -1);
>   
>   		/* even we skipped this reset, still need to set the job to guilty */
> -		if (job)
> +		if (job) {
>   			drm_sched_increase_karma(&job->base);
> +			r = DRM_GPU_SCHED_STAT_BAILING;
> +		}
>   		goto skip_recovery;
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 759b34799221..41390bdacd9e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>   	struct amdgpu_job *job = to_amdgpu_job(s_job);
>   	struct amdgpu_task_info ti;
>   	struct amdgpu_device *adev = ring->adev;
> +	int ret;
>   
>   	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>   
> @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>   		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
>   
>   	if (amdgpu_device_should_recover_gpu(ring->adev)) {
> -		amdgpu_device_gpu_recover(ring->adev, job);
> -		return DRM_GPU_SCHED_STAT_NOMINAL;
> +		ret = amdgpu_device_gpu_recover(ring->adev, job);
> +		if (ret == DRM_GPU_SCHED_STAT_BAILING)
> +			return DRM_GPU_SCHED_STAT_BAILING;
> +		else
> +			return DRM_GPU_SCHED_STAT_NOMINAL;
>   	} else {
>   		drm_sched_suspend_timeout(&ring->sched);
>   		if (amdgpu_sriov_vf(adev))
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 6003cfeb1322..e2cb4f32dae1 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>   	 * spurious. Bail out.
>   	 */
>   	if (dma_fence_is_signaled(job->done_fence))
> -		return DRM_GPU_SCHED_STAT_NOMINAL;
> +		return DRM_GPU_SCHED_STAT_BAILING;
>   
>   	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>   		js,
> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>   
>   	/* Scheduler is already stopped, nothing to do. */
>   	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
> -		return DRM_GPU_SCHED_STAT_NOMINAL;
> +		return DRM_GPU_SCHED_STAT_BAILING;
>   
>   	/* Schedule a reset if there's no reset in progress. */
>   	if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..a44f621fb5c4 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)  {
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_job *job;
> +	int ret;
>   
>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>   
> @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   		list_del_init(&job->list);
>   		spin_unlock(&sched->job_list_lock);
>   
> -		job->sched->ops->timedout_job(job);
> +		ret = job->sched->ops->timedout_job(job);
>   
> +		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
> +			spin_lock(&sched->job_list_lock);
> +			list_add(&job->node, &sched->ring_mirror_list);
> +			spin_unlock(&sched->job_list_lock);
> +		}
>   		/*
>   		 * Guilty job did complete and hence needs to be manually removed
>   		 * See drm_sched_stop doc.
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>   	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>   	DRM_GPU_SCHED_STAT_NOMINAL,
>   	DRM_GPU_SCHED_STAT_ENODEV,
> +	DRM_GPU_SCHED_STAT_BAILING,
>   };
>   
>   /**
> --
> 2.25.1
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-17  7:43       ` Christian König
@ 2021-03-17 14:50         ` Andrey Grodzovsky
  2021-03-17 15:11           ` Zhang, Jack (Jian)
  0 siblings, 1 reply; 20+ messages in thread
From: Andrey Grodzovsky @ 2021-03-17 14:50 UTC (permalink / raw)
  To: Christian König, Zhang, Jack (Jian),
	dri-devel, amd-gfx, Koenig, Christian, Liu, Monk, Deng, Emily,
	Rob Herring, Tomeu Vizoso, Steven Price

I actually have a race condition concern here - see bellow -

On 2021-03-17 3:43 a.m., Christian König wrote:
> I was hoping Andrey would take a look since I'm really busy with other 
> work right now.
>
> Regards,
> Christian.
>
> Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>> Hi, Andrey/Crhistian and Team,
>>
>> I didn't receive the reviewer's message from maintainers on panfrost 
>> driver for several days.
>> Due to this patch is urgent for my current working project.
>> Would you please help to give some review ideas?
>>
>> Many Thanks,
>> Jack
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Tuesday, March 16, 2021 3:20 PM
>> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey 
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, 
>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu 
>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
>> memleak
>>
>> [AMD Public Use]
>>
>> Ping
>>
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Monday, March 15, 2021 1:24 PM
>> To: Jack Zhang <Jack.Zhang1@amd.com>; 
>> dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey 
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, 
>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu 
>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
>> memleak
>>
>> [AMD Public Use]
>>
>> Hi, Rob/Tomeu/Steven,
>>
>> Would you please help to review this patch for panfrost driver?
>>
>> Thanks,
>> Jack Zhang
>>
>> -----Original Message-----
>> From: Jack Zhang <Jack.Zhang1@amd.com>
>> Sent: Monday, March 15, 2021 1:21 PM
>> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey 
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, 
>> Emily <Emily.Deng@amd.com>
>> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
>> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>>
>> re-insert Bailing jobs to avoid memory leak.
>>
>> V2: move re-insert step to drm/scheduler logic
>> V3: add panfrost's return value for bailing jobs in case it hits the 
>> memleak issue.
>>
>> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>   include/drm/gpu_scheduler.h                | 1 +
>>   5 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 79b9cc73763f..86463b0f936e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct 
>> amdgpu_device *adev,
>>                       job ? job->base.id : -1);
>>             /* even we skipped this reset, still need to set the job 
>> to guilty */
>> -        if (job)
>> +        if (job) {
>>               drm_sched_increase_karma(&job->base);
>> +            r = DRM_GPU_SCHED_STAT_BAILING;
>> +        }
>>           goto skip_recovery;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 759b34799221..41390bdacd9e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat 
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       struct amdgpu_task_info ti;
>>       struct amdgpu_device *adev = ring->adev;
>> +    int ret;
>>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat 
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>> +            return DRM_GPU_SCHED_STAT_BAILING;
>> +        else
>> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 6003cfeb1322..e2cb4f32dae1 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat 
>> panfrost_job_timedout(struct drm_sched_job
>>        * spurious. Bail out.
>>        */
>>       if (dma_fence_is_signaled(job->done_fence))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, 
>> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>           js,
>> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat 
>> panfrost_job_timedout(struct drm_sched_job
>>         /* Scheduler is already stopped, nothing to do. */
>>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         /* Schedule a reset if there's no reset in progress. */
>>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git 
>> a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 92d8de24d0a1..a44f621fb5c4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct 
>> work_struct *work)  {
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>> +    int ret;
>>         sched = container_of(work, struct drm_gpu_scheduler, 
>> work_tdr.work);
>>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct 
>> work_struct *work)
>>           list_del_init(&job->list);
>>           spin_unlock(&sched->job_list_lock);
>>   -        job->sched->ops->timedout_job(job);
>> +        ret = job->sched->ops->timedout_job(job);
>>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>> +            spin_lock(&sched->job_list_lock);
>> +            list_add(&job->node, &sched->ring_mirror_list);
>> +            spin_unlock(&sched->job_list_lock);
>> +        }


At this point we don't hold GPU reset locks anymore, and so we could
be racing against another TDR thread from another scheduler ring of same 
device
or another XGMI hive member. The other thread might be in the middle of 
luckless
iteration of mirror list (drm_sched_stop, drm_sched_start and 
drm_sched_resubmit)
and so locking job_list_lock will not help. Looks like it's required to 
take all GPU rest locks
here.

Andrey


>>           /*
>>            * Guilty job did complete and hence needs to be manually 
>> removed
>>            * See drm_sched_stop doc.
>> diff --git a/include/drm/gpu_scheduler.h 
>> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       DRM_GPU_SCHED_STAT_ENODEV,
>> +    DRM_GPU_SCHED_STAT_BAILING,
>>   };
>>     /**
>> -- 
>> 2.25.1
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0 
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-17 14:50         ` Andrey Grodzovsky
@ 2021-03-17 15:11           ` Zhang, Jack (Jian)
  2021-03-18 10:41             ` Zhang, Jack (Jian)
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-17 15:11 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx, Koenig, Christian, Liu,
	Monk, Deng,  Emily, Rob Herring, Tomeu Vizoso, Steven Price,
	Grodzovsky, Andrey


[-- Attachment #1.1: Type: text/plain, Size: 9597 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Hi,Andrey,

Good catch,I will expore this corner case and give feedback soon~

Best,
Jack

________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Wednesday, March 17, 2021 10:50:59 PM
To: Christian König <ckoenig.leichtzumerken@gmail.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

I actually have a race condition concern here - see bellow -

On 2021-03-17 3:43 a.m., Christian König wrote:
> I was hoping Andrey would take a look since I'm really busy with other
> work right now.
>
> Regards,
> Christian.
>
> Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>> Hi, Andrey/Crhistian and Team,
>>
>> I didn't receive the reviewer's message from maintainers on panfrost
>> driver for several days.
>> Due to this patch is urgent for my current working project.
>> Would you please help to give some review ideas?
>>
>> Many Thanks,
>> Jack
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Tuesday, March 16, 2021 3:20 PM
>> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng,
>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Ping
>>
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Monday, March 15, 2021 1:24 PM
>> To: Jack Zhang <Jack.Zhang1@amd.com>;
>> dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng,
>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Hi, Rob/Tomeu/Steven,
>>
>> Would you please help to review this patch for panfrost driver?
>>
>> Thanks,
>> Jack Zhang
>>
>> -----Original Message-----
>> From: Jack Zhang <Jack.Zhang1@amd.com>
>> Sent: Monday, March 15, 2021 1:21 PM
>> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng,
>> Emily <Emily.Deng@amd.com>
>> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
>> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>>
>> re-insert Bailing jobs to avoid memory leak.
>>
>> V2: move re-insert step to drm/scheduler logic
>> V3: add panfrost's return value for bailing jobs in case it hits the
>> memleak issue.
>>
>> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>   include/drm/gpu_scheduler.h                | 1 +
>>   5 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 79b9cc73763f..86463b0f936e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>> amdgpu_device *adev,
>>                       job ? job->base.id : -1);
>>             /* even we skipped this reset, still need to set the job
>> to guilty */
>> -        if (job)
>> +        if (job) {
>>               drm_sched_increase_karma(&job->base);
>> +            r = DRM_GPU_SCHED_STAT_BAILING;
>> +        }
>>           goto skip_recovery;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 759b34799221..41390bdacd9e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       struct amdgpu_task_info ti;
>>       struct amdgpu_device *adev = ring->adev;
>> +    int ret;
>>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>> +            return DRM_GPU_SCHED_STAT_BAILING;
>> +        else
>> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 6003cfeb1322..e2cb4f32dae1 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>        * spurious. Bail out.
>>        */
>>       if (dma_fence_is_signaled(job->done_fence))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>           js,
>> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>         /* Scheduler is already stopped, nothing to do. */
>>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         /* Schedule a reset if there's no reset in progress. */
>>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>> a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 92d8de24d0a1..a44f621fb5c4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)  {
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>> +    int ret;
>>         sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
>>           list_del_init(&job->list);
>>           spin_unlock(&sched->job_list_lock);
>>   -        job->sched->ops->timedout_job(job);
>> +        ret = job->sched->ops->timedout_job(job);
>>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>> +            spin_lock(&sched->job_list_lock);
>> +            list_add(&job->node, &sched->ring_mirror_list);
>> +            spin_unlock(&sched->job_list_lock);
>> +        }


At this point we don't hold GPU reset locks anymore, and so we could
be racing against another TDR thread from another scheduler ring of same
device
or another XGMI hive member. The other thread might be in the middle of
luckless
iteration of mirror list (drm_sched_stop, drm_sched_start and
drm_sched_resubmit)
and so locking job_list_lock will not help. Looks like it's required to
take all GPU rest locks
here.

Andrey


>>           /*
>>            * Guilty job did complete and hence needs to be manually
>> removed
>>            * See drm_sched_stop doc.
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       DRM_GPU_SCHED_STAT_ENODEV,
>> +    DRM_GPU_SCHED_STAT_BAILING,
>>   };
>>     /**
>> --
>> 2.25.1
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0
>>
>

[-- Attachment #1.2: Type: text/html, Size: 16188 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-17 15:11           ` Zhang, Jack (Jian)
@ 2021-03-18 10:41             ` Zhang, Jack (Jian)
  2021-03-18 16:16               ` Andrey Grodzovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-18 10:41 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	Christian König, dri-devel, amd-gfx, Koenig, Christian, Liu,
	Monk, Deng,  Emily, Rob Herring, Tomeu Vizoso, Steven Price,
	Grodzovsky, Andrey


[-- Attachment #1.1: Type: text/plain, Size: 14554 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey

Let me summarize the background of this patch:

In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
It will submit first jobs of each ring and do guilty job re-check.
At that point, We had to make sure each job is in the mirror list(or re-inserted back already).

But we found the current code never re-insert the job to mirror list in the 2nd, 3rd job_timeout thread(Bailing TDR thread).
This not only will cause memleak of the bailing jobs. What’s more important, the 1st tdr thread can never iterate the bailing job and set its guilty status to a correct status.

Therefore, we had to re-insert the job(or even not delete node) for bailing job.

For the above V3 patch, the racing condition in my mind is:
we cannot make sure all bailing jobs are finished before we do amdgpu_device_recheck_guilty_jobs.

Based on this insight, I think we have two options to solve this issue:

  1.  Skip delete node in tdr thread2, thread3, 4 … (using mutex or atomic variable)
  2.  Re-insert back bailing job, and meanwhile use semaphore in each tdr thread to keep the sequence as expected and ensure each job is in the mirror list when do resubmit step.

For Option1, logic is simpler and we need only one global atomic variable:
What do you think about this plan?

Option1 should look like the following logic:


+static atomic_t in_reset;             //a global atomic var for synchronization
static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb);
 /**
@@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct work_struct *work)
                 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
                 * is parked at which point it's safe.
                 */
+               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip delete node if it’s thead1,2,3,….
+                       spin_unlock(&sched->job_list_lock);
+                       drm_sched_start_timeout(sched);
+                       return;
+               }
+
                list_del_init(&job->node);
                spin_unlock(&sched->job_list_lock);
@@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
        spin_lock(&sched->job_list_lock);
        drm_sched_start_timeout(sched);
        spin_unlock(&sched->job_list_lock);
+       atomic_set(&in_reset, 0); //reset in_reset when the first thread finished tdr
}


Thanks,
Jack
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Zhang, Jack (Jian)
Sent: Wednesday, March 17, 2021 11:11 PM
To: Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

Hi,Andrey,

Good catch,I will expore this corner case and give feedback soon~
Best,
Jack

________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>
Sent: Wednesday, March 17, 2021 10:50:59 PM
To: Christian König <ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> <dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng, Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

I actually have a race condition concern here - see bellow -

On 2021-03-17 3:43 a.m., Christian König wrote:
> I was hoping Andrey would take a look since I'm really busy with other
> work right now.
>
> Regards,
> Christian.
>
> Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>> Hi, Andrey/Crhistian and Team,
>>
>> I didn't receive the reviewer's message from maintainers on panfrost
>> driver for several days.
>> Due to this patch is urgent for my current working project.
>> Would you please help to give some review ideas?
>>
>> Many Thanks,
>> Jack
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Tuesday, March 16, 2021 3:20 PM
>> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Ping
>>
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Monday, March 15, 2021 1:24 PM
>> To: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>;
>> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Hi, Rob/Tomeu/Steven,
>>
>> Would you please help to review this patch for panfrost driver?
>>
>> Thanks,
>> Jack Zhang
>>
>> -----Original Message-----
>> From: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> Sent: Monday, March 15, 2021 1:21 PM
>> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>
>> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>>
>> re-insert Bailing jobs to avoid memory leak.
>>
>> V2: move re-insert step to drm/scheduler logic
>> V3: add panfrost's return value for bailing jobs in case it hits the
>> memleak issue.
>>
>> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>   include/drm/gpu_scheduler.h                | 1 +
>>   5 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 79b9cc73763f..86463b0f936e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>> amdgpu_device *adev,
>>                       job ? job->base.id : -1);
>>             /* even we skipped this reset, still need to set the job
>> to guilty */
>> -        if (job)
>> +        if (job) {
>>               drm_sched_increase_karma(&job->base);
>> +            r = DRM_GPU_SCHED_STAT_BAILING;
>> +        }
>>           goto skip_recovery;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 759b34799221..41390bdacd9e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       struct amdgpu_task_info ti;
>>       struct amdgpu_device *adev = ring->adev;
>> +    int ret;
>>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>> +            return DRM_GPU_SCHED_STAT_BAILING;
>> +        else
>> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 6003cfeb1322..e2cb4f32dae1 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>        * spurious. Bail out.
>>        */
>>       if (dma_fence_is_signaled(job->done_fence))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>           js,
>> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>         /* Scheduler is already stopped, nothing to do. */
>>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         /* Schedule a reset if there's no reset in progress. */
>>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>> a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 92d8de24d0a1..a44f621fb5c4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)  {
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>> +    int ret;
>>         sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
>>           list_del_init(&job->list);
>>           spin_unlock(&sched->job_list_lock);
>>   -        job->sched->ops->timedout_job(job);
>> +        ret = job->sched->ops->timedout_job(job);
>>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>> +            spin_lock(&sched->job_list_lock);
>> +            list_add(&job->node, &sched->ring_mirror_list);
>> +            spin_unlock(&sched->job_list_lock);
>> +        }


At this point we don't hold GPU reset locks anymore, and so we could
be racing against another TDR thread from another scheduler ring of same
device
or another XGMI hive member. The other thread might be in the middle of
luckless
iteration of mirror list (drm_sched_stop, drm_sched_start and
drm_sched_resubmit)
and so locking job_list_lock will not help. Looks like it's required to
take all GPU rest locks
here.

Andrey


>>           /*
>>            * Guilty job did complete and hence needs to be manually
>> removed
>>            * See drm_sched_stop doc.
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       DRM_GPU_SCHED_STAT_ENODEV,
>> +    DRM_GPU_SCHED_STAT_BAILING,
>>   };
>>     /**
>> --
>> 2.25.1
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=0>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 29344 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-18 10:41             ` Zhang, Jack (Jian)
@ 2021-03-18 16:16               ` Andrey Grodzovsky
  2021-03-25  9:51                 ` Zhang, Jack (Jian)
  0 siblings, 1 reply; 20+ messages in thread
From: Andrey Grodzovsky @ 2021-03-18 16:16 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	Christian König, dri-devel, amd-gfx, Koenig, Christian, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price


[-- Attachment #1.1: Type: text/plain, Size: 16978 bytes --]


On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Hi, Andrey
>
> Let me summarize the background of this patch:
>
> In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>
> It will submit first jobs of each ring and do guilty job re-check.
>
> At that point, We had to make sure each job is in the mirror list(or 
> re-inserted back already).
>
> But we found the current code never re-insert the job to mirror list 
> in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>
> This not only will cause memleak of the bailing jobs. What’s more 
> important, the 1^st tdr thread can never iterate the bailing job and 
> set its guilty status to a correct status.
>
> Therefore, we had to re-insert the job(or even not delete node) for 
> bailing job.
>
> For the above V3 patch, the racing condition in my mind is:
>
> we cannot make sure all bailing jobs are finished before we do 
> amdgpu_device_recheck_guilty_jobs.
>

Yes,that race i missed - so you say that for 2nd, baling thread who 
extracted the job, even if he reinsert it right away back after driver 
callback return DRM_GPU_SCHED_STAT_BAILING, there is small time slot 
where the job is not in mirror list and so the 1st TDR might miss it and 
not find that  2nd job is the actual guilty job, right ? But, still this 
job will get back into mirror list, and since it's really the bad job, 
it will never signal completion and so on the next timeout cycle it will 
be caught (of course there is a starvation scenario here if more TDRs 
kick in and it bails out again but this is really unlikely).


> Based on this insight, I think we have two options to solve this issue:
>
>  1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>     atomic variable)
>  2. Re-insert back bailing job, and meanwhile use semaphore in each
>     tdr thread to keep the sequence as expected and ensure each job is
>     in the mirror list when do resubmit step.
>
> For Option1, logic is simpler and we need only one global atomic variable:
>
> What do you think about this plan?
>
> Option1 should look like the following logic:
>
> +static atomic_t in_reset; //a global atomic var for synchronization
>
> static void drm_sched_process_job(struct dma_fence *f, struct 
> dma_fence_cb *cb);
>
>  /**
>
> @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct 
> work_struct *work)
>
>                  * drm_sched_cleanup_jobs. It will be reinserted back 
> after sched->thread
>
>                  * is parked at which point it's safe.
>
>                  */
>
> +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip 
> delete node if it’s thead1,2,3,….
>
> + spin_unlock(&sched->job_list_lock);
>
> + drm_sched_start_timeout(sched);
>
> +                       return;
>
> +               }
>
> +
>
> list_del_init(&job->node);
>
> spin_unlock(&sched->job_list_lock);
>
> @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct 
> work_struct *work)
>
> spin_lock(&sched->job_list_lock);
>
>         drm_sched_start_timeout(sched);
>
> spin_unlock(&sched->job_list_lock);
>
> +       atomic_set(&in_reset, 0); //reset in_reset when the first 
> thread finished tdr
>
> }
>

Technically looks like it should work as you don't access the job 
pointer any longer and so no risk that if signaled it will be freed by 
drm_sched_get_cleanup_job but,you can't just use one global variable an 
by this bailing from TDR when different drivers run their TDR threads in 
parallel, and even for amdgpu, if devices in different XGMI hives or 2 
independent devices in non XGMI setup. There should be defined some kind 
of GPU reset group structure on drm_scheduler level for which this 
variable would be used.

P.S I wonder why we can't just ref-count the job so that even if 
drm_sched_get_cleanup_job would delete it before we had a chance to stop 
the scheduler thread, we wouldn't crash. This would avoid all the dance 
with deletion and reinsertion.

Andrey


> Thanks,
>
> Jack
>
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of 
> *Zhang, Jack (Jian)
> *Sent:* Wednesday, March 17, 2021 11:11 PM
> *To:* Christian König <ckoenig.leichtzumerken@gmail.com>; 
> dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
> Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk 
> <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring 
> <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven 
> Price <steven.price@arm.com>; Grodzovsky, Andrey 
> <Andrey.Grodzovsky@amd.com>
> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
> memleak
>
> [AMD Official Use Only - Internal Distribution Only]
>
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi,Andrey,
>
> Good catch,I will expore this corner case and give feedback soon~
>
> Best,
>
> Jack
>
> ------------------------------------------------------------------------
>
> *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com 
> <mailto:Andrey.Grodzovsky@amd.com>>
> *Sent:* Wednesday, March 17, 2021 10:50:59 PM
> *To:* Christian König <ckoenig.leichtzumerken@gmail.com 
> <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian) 
> <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>; 
> dri-devel@lists.freedesktop.org 
> <mailto:dri-devel@lists.freedesktop.org> 
> <dri-devel@lists.freedesktop.org 
> <mailto:dri-devel@lists.freedesktop.org>>; 
> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> 
> <amd-gfx@lists.freedesktop.org 
> <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian 
> <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu, 
> Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily 
> <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring 
> <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso 
> <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>; 
> Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
> memleak
>
> I actually have a race condition concern here - see bellow -
>
> On 2021-03-17 3:43 a.m., Christian König wrote:
> > I was hoping Andrey would take a look since I'm really busy with other
> > work right now.
> >
> > Regards,
> > Christian.
> >
> > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
> >> Hi, Andrey/Crhistian and Team,
> >>
> >> I didn't receive the reviewer's message from maintainers on panfrost
> >> driver for several days.
> >> Due to this patch is urgent for my current working project.
> >> Would you please help to give some review ideas?
> >>
> >> Many Thanks,
> >> Jack
> >> -----Original Message-----
> >> From: Zhang, Jack (Jian)
> >> Sent: Tuesday, March 16, 2021 3:20 PM
> >> To: dri-devel@lists.freedesktop.org 
> <mailto:dri-devel@lists.freedesktop.org>; 
> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
> >> Koenig, Christian <Christian.Koenig@amd.com 
> <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
> >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>; 
> Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
> >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring 
> <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
> >> Vizoso <tomeu.vizoso@collabora.com 
> <mailto:tomeu.vizoso@collabora.com>>; Steven Price 
> <steven.price@arm.com <mailto:steven.price@arm.com>>
> >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> >> memleak
> >>
> >> [AMD Public Use]
> >>
> >> Ping
> >>
> >> -----Original Message-----
> >> From: Zhang, Jack (Jian)
> >> Sent: Monday, March 15, 2021 1:24 PM
> >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
> >> dri-devel@lists.freedesktop.org 
> <mailto:dri-devel@lists.freedesktop.org>; 
> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
> >> Koenig, Christian <Christian.Koenig@amd.com 
> <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
> >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>; 
> Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
> >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring 
> <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
> >> Vizoso <tomeu.vizoso@collabora.com 
> <mailto:tomeu.vizoso@collabora.com>>; Steven Price 
> <steven.price@arm.com <mailto:steven.price@arm.com>>
> >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> >> memleak
> >>
> >> [AMD Public Use]
> >>
> >> Hi, Rob/Tomeu/Steven,
> >>
> >> Would you please help to review this patch for panfrost driver?
> >>
> >> Thanks,
> >> Jack Zhang
> >>
> >> -----Original Message-----
> >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
> >> Sent: Monday, March 15, 2021 1:21 PM
> >> To: dri-devel@lists.freedesktop.org 
> <mailto:dri-devel@lists.freedesktop.org>; 
> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
> >> Koenig, Christian <Christian.Koenig@amd.com 
> <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
> >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>; 
> Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
> >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
> >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com 
> <mailto:Jack.Zhang1@amd.com>>
> >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
> memleak
> >>
> >> re-insert Bailing jobs to avoid memory leak.
> >>
> >> V2: move re-insert step to drm/scheduler logic
> >> V3: add panfrost's return value for bailing jobs in case it hits the
> >> memleak issue.
> >>
> >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com 
> <mailto:Jack.Zhang1@amd.com>>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
> >> drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
> >> drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
> >> include/drm/gpu_scheduler.h                | 1 +
> >>   5 files changed, 19 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 79b9cc73763f..86463b0f936e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
> >> amdgpu_device *adev,
> >>                       job ? job->base.id : -1);
> >>             /* even we skipped this reset, still need to set the job
> >> to guilty */
> >> -        if (job)
> >> +        if (job) {
> >> drm_sched_increase_karma(&job->base);
> >> +            r = DRM_GPU_SCHED_STAT_BAILING;
> >> +        }
> >>           goto skip_recovery;
> >>       }
> >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> index 759b34799221..41390bdacd9e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
> >> amdgpu_job_timedout(struct drm_sched_job *s_job)
> >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
> >>       struct amdgpu_task_info ti;
> >>       struct amdgpu_device *adev = ring->adev;
> >> +    int ret;
> >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
> >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
> >> amdgpu_job_timedout(struct drm_sched_job *s_job)
> >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
> >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
> >> - amdgpu_device_gpu_recover(ring->adev, job);
> >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
> >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
> >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
> >> +            return DRM_GPU_SCHED_STAT_BAILING;
> >> +        else
> >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
> >>       } else {
> >> drm_sched_suspend_timeout(&ring->sched);
> >>           if (amdgpu_sriov_vf(adev))
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
> >> b/drivers/gpu/drm/panfrost/panfrost_job.c
> >> index 6003cfeb1322..e2cb4f32dae1 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
> >> panfrost_job_timedout(struct drm_sched_job
> >>        * spurious. Bail out.
> >>        */
> >>       if (dma_fence_is_signaled(job->done_fence))
> >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
> >> +        return DRM_GPU_SCHED_STAT_BAILING;
> >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
> >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
> >>           js,
> >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
> >> panfrost_job_timedout(struct drm_sched_job
> >>         /* Scheduler is already stopped, nothing to do. */
> >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
> >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
> >> +        return DRM_GPU_SCHED_STAT_BAILING;
> >>         /* Schedule a reset if there's no reset in progress. */
> >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
> >> a/drivers/gpu/drm/scheduler/sched_main.c
> >> b/drivers/gpu/drm/scheduler/sched_main.c
> >> index 92d8de24d0a1..a44f621fb5c4 100644
> >> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
> >> work_struct *work)  {
> >>       struct drm_gpu_scheduler *sched;
> >>       struct drm_sched_job *job;
> >> +    int ret;
> >>         sched = container_of(work, struct drm_gpu_scheduler,
> >> work_tdr.work);
> >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
> >> work_struct *work)
> >>           list_del_init(&job->list);
> >> spin_unlock(&sched->job_list_lock);
> >>   - job->sched->ops->timedout_job(job);
> >> +        ret = job->sched->ops->timedout_job(job);
> >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
> >> + spin_lock(&sched->job_list_lock);
> >> +            list_add(&job->node, &sched->ring_mirror_list);
> >> + spin_unlock(&sched->job_list_lock);
> >> +        }
>
>
> At this point we don't hold GPU reset locks anymore, and so we could
> be racing against another TDR thread from another scheduler ring of same
> device
> or another XGMI hive member. The other thread might be in the middle of
> luckless
> iteration of mirror list (drm_sched_stop, drm_sched_start and
> drm_sched_resubmit)
> and so locking job_list_lock will not help. Looks like it's required to
> take all GPU rest locks
> here.
>
> Andrey
>
>
> >>           /*
> >>            * Guilty job did complete and hence needs to be manually
> >> removed
> >>            * See drm_sched_stop doc.
> >> diff --git a/include/drm/gpu_scheduler.h
> >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
> >> --- a/include/drm/gpu_scheduler.h
> >> +++ b/include/drm/gpu_scheduler.h
> >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
> >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
> >>       DRM_GPU_SCHED_STAT_NOMINAL,
> >>       DRM_GPU_SCHED_STAT_ENODEV,
> >> +    DRM_GPU_SCHED_STAT_BAILING,
> >>   };
> >>     /**
> >> --
> >> 2.25.1
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
> >> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=0> 
>
> >>
> >
>

[-- Attachment #1.2: Type: text/html, Size: 40255 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-15  5:23 ` Zhang, Jack (Jian)
  2021-03-16  7:19   ` Zhang, Jack (Jian)
@ 2021-03-22 15:29   ` Steven Price
  2021-03-26  2:04     ` Zhang, Jack (Jian)
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-03-22 15:29 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso

On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:
> [AMD Public Use]
> 
> Hi, Rob/Tomeu/Steven,
> 
> Would you please help to review this patch for panfrost driver?
> 
> Thanks,
> Jack Zhang
> 
> -----Original Message-----
> From: Jack Zhang <Jack.Zhang1@amd.com>
> Sent: Monday, March 15, 2021 1:21 PM
> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>
> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
> 
> re-insert Bailing jobs to avoid memory leak.
> 
> V2: move re-insert step to drm/scheduler logic
> V3: add panfrost's return value for bailing jobs
> in case it hits the memleak issue.

This commit message could do with some work - it's really hard to 
decipher what the actual problem you're solving is.

> 
> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>   include/drm/gpu_scheduler.h                | 1 +
>   5 files changed, 19 insertions(+), 6 deletions(-)
> 
[...]
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 6003cfeb1322..e2cb4f32dae1 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>   	 * spurious. Bail out.
>   	 */
>   	if (dma_fence_is_signaled(job->done_fence))
> -		return DRM_GPU_SCHED_STAT_NOMINAL;
> +		return DRM_GPU_SCHED_STAT_BAILING;
>   
>   	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>   		js,
> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>   
>   	/* Scheduler is already stopped, nothing to do. */
>   	if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
> -		return DRM_GPU_SCHED_STAT_NOMINAL;
> +		return DRM_GPU_SCHED_STAT_BAILING;
>   
>   	/* Schedule a reset if there's no reset in progress. */
>   	if (!atomic_xchg(&pfdev->reset.pending, 1))

This looks correct to me - in these two cases drm_sched_stop() is not 
called on the sched_job, so it looks like currently the job will be leaked.

> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..a44f621fb5c4 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   {
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_job *job;
> +	int ret;
>   
>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>   
> @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   		list_del_init(&job->list);
>   		spin_unlock(&sched->job_list_lock);
>   
> -		job->sched->ops->timedout_job(job);
> +		ret = job->sched->ops->timedout_job(job);
>   
> +		if (ret == DRM_GPU_SCHED_STAT_BAILING) {
> +			spin_lock(&sched->job_list_lock);
> +			list_add(&job->node, &sched->ring_mirror_list);
> +			spin_unlock(&sched->job_list_lock);
> +		}

I think we could really do with a comment somewhere explaining what 
"bailing" means in this context. For the Panfrost case we have two cases:

  * The GPU job actually finished while the timeout code was running 
(done_fence is signalled).

  * The GPU is already in the process of being reset (Panfrost has 
multiple queues, so mostly like a bad job in another queue).

I'm also not convinced that (for Panfrost) it makes sense to be adding 
the jobs back to the list. For the first case above clearly the job 
could just be freed (it's complete). The second case is more interesting 
and Panfrost currently doesn't handle this well. In theory the driver 
could try to rescue the job ('soft stop' in Mali language) so that it 
could be resubmitted. Panfrost doesn't currently support that, so 
attempting to resubmit the job is almost certainly going to fail.

It's on my TODO list to look at improving Panfrost in this regard, but 
sadly still quite far down.

Steve

>   		/*
>   		 * Guilty job did complete and hence needs to be manually removed
>   		 * See drm_sched_stop doc.
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 4ea8606d91fe..8093ac2427ef 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>   	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>   	DRM_GPU_SCHED_STAT_NOMINAL,
>   	DRM_GPU_SCHED_STAT_ENODEV,
> +	DRM_GPU_SCHED_STAT_BAILING,
>   };
>   
>   /**
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-18 16:16               ` Andrey Grodzovsky
@ 2021-03-25  9:51                 ` Zhang, Jack (Jian)
  2021-03-25 16:32                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-25  9:51 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Koenig, Christian, Liu, Monk, Deng,  Emily, Rob Herring,
	Tomeu Vizoso, Steven Price


[-- Attachment #1.1: Type: text/plain, Size: 21071 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey

Thank you for your good opinions.
I literally agree with you that the refcount could solve the get_clean_up_up cocurrent job gracefully, and no need to re-insert the
job back anymore.

I quickly made a draft for this idea as follows:
How do you like it? I will start implement to it after I got your acknowledge.

Thanks,
Jack

+void drm_job_get(struct drm_sched_job *s_job)
+{
+       kref_get(&s_job->refcount);
+}
+
+void drm_job_do_release(struct kref *ref)
+{
+       struct drm_sched_job *s_job;
+       struct drm_gpu_scheduler *sched;
+
+       s_job = container_of(ref, struct drm_sched_job, refcount);
+       sched = s_job->sched;
+       sched->ops->free_job(s_job);
+}
+
+void drm_job_put(struct drm_sched_job *s_job)
+{
+       kref_put(&s_job->refcount, drm_job_do_release);
+}
+
static void drm_sched_job_begin(struct drm_sched_job *s_job)
{
        struct drm_gpu_scheduler *sched = s_job->sched;
+       kref_init(&s_job->refcount);
+       drm_job_get(s_job);
        spin_lock(&sched->job_list_lock);
        list_add_tail(&s_job->node, &sched->ring_mirror_list);
        drm_sched_start_timeout(sched);
@@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct work_struct *work)
                 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
                 * is parked at which point it's safe.
                 */
-               list_del_init(&job->node);
+               drm_job_get(job);
                spin_unlock(&sched->job_list_lock);
                job->sched->ops->timedout_job(job);
-
+               drm_job_put(job);
                /*
                 * Guilty job did complete and hence needs to be manually removed
                 * See drm_sched_stop doc.
                 */
                if (sched->free_guilty) {
-                       job->sched->ops->free_job(job);
                        sched->free_guilty = false;
                }
        } else {
@@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
-       /*
-        * Reinsert back the bad job here - now it's safe as
-        * drm_sched_get_cleanup_job cannot race against us and release the
-        * bad job at this point - we parked (waited for) any in progress
-        * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-        * now until the scheduler thread is unparked.
-        */
-       if (bad && bad->sched == sched)
-               /*
-                * Add at the head of the queue to reflect it was the earliest
-                * job extracted.
-                */
-               list_add(&bad->node, &sched->ring_mirror_list);
-
        /*
         * Iterate the job list from later to  earlier one and either deactive
         * their HW callbacks or remove them from mirror list if they already
@@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
                                         kthread_should_stop());
                if (cleanup_job) {
-                       sched->ops->free_job(cleanup_job);
+                       drm_job_put(cleanup_job);
                        /* queue timeout for next job */
                        drm_sched_start_timeout(sched);
                }
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5a1f068af1c2..b80513eec90f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * to schedule the job.
  */
struct drm_sched_job {
+       struct kref                     refcount;
        struct spsc_node                queue_node;
        struct drm_gpu_scheduler        *sched;
        struct drm_sched_fence          *s_fence;
@@ -198,6 +199,7 @@ struct drm_sched_job {
        enum drm_sched_priority         s_priority;
        struct drm_sched_entity  *entity;
        struct dma_fence_cb             cb;
+
};

From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Friday, March 19, 2021 12:17 AM
To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak



On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:

[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey

Let me summarize the background of this patch:

In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
It will submit first jobs of each ring and do guilty job re-check.
At that point, We had to make sure each job is in the mirror list(or re-inserted back already).

But we found the current code never re-insert the job to mirror list in the 2nd, 3rd job_timeout thread(Bailing TDR thread).
This not only will cause memleak of the bailing jobs. What’s more important, the 1st tdr thread can never iterate the bailing job and set its guilty status to a correct status.

Therefore, we had to re-insert the job(or even not delete node) for bailing job.

For the above V3 patch, the racing condition in my mind is:
we cannot make sure all bailing jobs are finished before we do amdgpu_device_recheck_guilty_jobs.



Yes,that race i missed - so you say that for 2nd, baling thread who extracted the job, even if he reinsert it right away back after driver callback return DRM_GPU_SCHED_STAT_BAILING, there is small time slot where the job is not in mirror list and so the 1st TDR might miss it and not find that  2nd job is the actual guilty job, right ? But, still this job will get back into mirror list, and since it's really the bad job, it will never signal completion and so on the next timeout cycle it will be caught (of course there is a starvation scenario here if more TDRs kick in and it bails out again but this is really unlikely).



Based on this insight, I think we have two options to solve this issue:

  1.  Skip delete node in tdr thread2, thread3, 4 … (using mutex or atomic variable)
  2.  Re-insert back bailing job, and meanwhile use semaphore in each tdr thread to keep the sequence as expected and ensure each job is in the mirror list when do resubmit step.

For Option1, logic is simpler and we need only one global atomic variable:
What do you think about this plan?

Option1 should look like the following logic:


+static atomic_t in_reset;             //a global atomic var for synchronization
static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb);
 /**
@@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct work_struct *work)
                 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
                 * is parked at which point it's safe.
                 */
+               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip delete node if it’s thead1,2,3,….
+                       spin_unlock(&sched->job_list_lock);
+                       drm_sched_start_timeout(sched);
+                       return;
+               }
+
                list_del_init(&job->node);
                spin_unlock(&sched->job_list_lock);
@@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
        spin_lock(&sched->job_list_lock);
        drm_sched_start_timeout(sched);
        spin_unlock(&sched->job_list_lock);
+       atomic_set(&in_reset, 0); //reset in_reset when the first thread finished tdr
}



Technically looks like it should work as you don't access the job pointer any longer and so no risk that if signaled it will be freed by drm_sched_get_cleanup_job but,you can't just use one global variable an by this bailing from TDR when different drivers run their TDR threads in parallel, and even for amdgpu, if devices in different XGMI hives or 2 independent devices in non XGMI setup. There should be defined some kind of GPU reset group structure on drm_scheduler level for which this variable would be used.

P.S I wonder why we can't just ref-count the job so that even if drm_sched_get_cleanup_job would delete it before we had a chance to stop the scheduler thread, we wouldn't crash. This would avoid all the dance with deletion and reinsertion.

Andrey




Thanks,
Jack
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Zhang, Jack (Jian)
Sent: Wednesday, March 17, 2021 11:11 PM
To: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com><mailto:Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com><mailto:Emily.Deng@amd.com>; Rob Herring <robh@kernel.org><mailto:robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com><mailto:tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com><mailto:steven.price@arm.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com><mailto:Andrey.Grodzovsky@amd.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

Hi,Andrey,

Good catch,I will expore this corner case and give feedback soon~
Best,
Jack

________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>
Sent: Wednesday, March 17, 2021 10:50:59 PM
To: Christian König <ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> <dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng, Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

I actually have a race condition concern here - see bellow -

On 2021-03-17 3:43 a.m., Christian König wrote:
> I was hoping Andrey would take a look since I'm really busy with other
> work right now.
>
> Regards,
> Christian.
>
> Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>> Hi, Andrey/Crhistian and Team,
>>
>> I didn't receive the reviewer's message from maintainers on panfrost
>> driver for several days.
>> Due to this patch is urgent for my current working project.
>> Would you please help to give some review ideas?
>>
>> Many Thanks,
>> Jack
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Tuesday, March 16, 2021 3:20 PM
>> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Ping
>>
>> -----Original Message-----
>> From: Zhang, Jack (Jian)
>> Sent: Monday, March 15, 2021 1:24 PM
>> To: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>;
>> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>; Rob Herring <robh@kernel.org<mailto:robh@kernel.org>>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com<mailto:tomeu.vizoso@collabora.com>>; Steven Price <steven.price@arm.com<mailto:steven.price@arm.com>>
>> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> [AMD Public Use]
>>
>> Hi, Rob/Tomeu/Steven,
>>
>> Would you please help to review this patch for panfrost driver?
>>
>> Thanks,
>> Jack Zhang
>>
>> -----Original Message-----
>> From: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> Sent: Monday, March 15, 2021 1:21 PM
>> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>> Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com<mailto:Andrey.Grodzovsky@amd.com>>; Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Deng,
>> Emily <Emily.Deng@amd.com<mailto:Emily.Deng@amd.com>>
>> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>>
>> re-insert Bailing jobs to avoid memory leak.
>>
>> V2: move re-insert step to drm/scheduler logic
>> V3: add panfrost's return value for bailing jobs in case it hits the
>> memleak issue.
>>
>> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com<mailto:Jack.Zhang1@amd.com>>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>   include/drm/gpu_scheduler.h                | 1 +
>>   5 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 79b9cc73763f..86463b0f936e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>> amdgpu_device *adev,
>>                       job ? job->base.id : -1);
>>             /* even we skipped this reset, still need to set the job
>> to guilty */
>> -        if (job)
>> +        if (job) {
>>               drm_sched_increase_karma(&job->base);
>> +            r = DRM_GPU_SCHED_STAT_BAILING;
>> +        }
>>           goto skip_recovery;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 759b34799221..41390bdacd9e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       struct amdgpu_task_info ti;
>>       struct amdgpu_device *adev = ring->adev;
>> +    int ret;
>>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>> +            return DRM_GPU_SCHED_STAT_BAILING;
>> +        else
>> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 6003cfeb1322..e2cb4f32dae1 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>        * spurious. Bail out.
>>        */
>>       if (dma_fence_is_signaled(job->done_fence))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>           js,
>> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>         /* Scheduler is already stopped, nothing to do. */
>>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
>> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>> +        return DRM_GPU_SCHED_STAT_BAILING;
>>         /* Schedule a reset if there's no reset in progress. */
>>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>> a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 92d8de24d0a1..a44f621fb5c4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)  {
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>> +    int ret;
>>         sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
>>           list_del_init(&job->list);
>>           spin_unlock(&sched->job_list_lock);
>>   -        job->sched->ops->timedout_job(job);
>> +        ret = job->sched->ops->timedout_job(job);
>>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>> +            spin_lock(&sched->job_list_lock);
>> +            list_add(&job->node, &sched->ring_mirror_list);
>> +            spin_unlock(&sched->job_list_lock);
>> +        }


At this point we don't hold GPU reset locks anymore, and so we could
be racing against another TDR thread from another scheduler ring of same
device
or another XGMI hive member. The other thread might be in the middle of
luckless
iteration of mirror list (drm_sched_stop, drm_sched_start and
drm_sched_resubmit)
and so locking job_list_lock will not help. Looks like it's required to
take all GPU rest locks
here.

Andrey


>>           /*
>>            * Guilty job did complete and hence needs to be manually
>> removed
>>            * See drm_sched_stop doc.
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       DRM_GPU_SCHED_STAT_ENODEV,
>> +    DRM_GPU_SCHED_STAT_BAILING,
>>   };
>>     /**
>> --
>> 2.25.1
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=0>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 44017 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-25  9:51                 ` Zhang, Jack (Jian)
@ 2021-03-25 16:32                   ` Andrey Grodzovsky
  2021-03-26  2:23                     ` Zhang, Jack (Jian)
  0 siblings, 1 reply; 20+ messages in thread
From: Andrey Grodzovsky @ 2021-03-25 16:32 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	Christian König, dri-devel, amd-gfx, Koenig, Christian, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

There are a few issues here like - how u handle non guilty singnaled jobs
in drm_sched_stop, currently looks like you don't call put for them
and just explicitly free them as before. Also sched->free_guilty
seems useless with the new approach. Do we even need the cleanup
mechanism at drm_sched_get_cleanup_job with this approach...

But first - We need Christian to express his opinion on this since
I think he opposed refcounting jobs and that we should concentrate on
fences instead.

Christian - can you chime in here ?

Andrey

On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> 
> Hi, Andrey
> 
> Thank you for your good opinions.
> 
> I literally agree with you that the refcount could solve the 
> get_clean_up_up cocurrent job gracefully, and no need to re-insert the
> 
> job back anymore.
> 
> I quickly made a draft for this idea as follows:
> 
> How do you like it? I will start implement to it after I got your 
> acknowledge.
> 
> Thanks,
> 
> Jack
> 
> +void drm_job_get(struct drm_sched_job *s_job)
> 
> +{
> 
> +       kref_get(&s_job->refcount);
> 
> +}
> 
> +
> 
> +void drm_job_do_release(struct kref *ref)
> 
> +{
> 
> +       struct drm_sched_job *s_job;
> 
> +       struct drm_gpu_scheduler *sched;
> 
> +
> 
> +       s_job = container_of(ref, struct drm_sched_job, refcount);
> 
> +       sched = s_job->sched;
> 
> +       sched->ops->free_job(s_job);
> 
> +}
> 
> +
> 
> +void drm_job_put(struct drm_sched_job *s_job)
> 
> +{
> 
> +       kref_put(&s_job->refcount, drm_job_do_release);
> 
> +}
> 
> +
> 
> static void drm_sched_job_begin(struct drm_sched_job *s_job)
> 
> {
> 
>          struct drm_gpu_scheduler *sched = s_job->sched;
> 
> +       kref_init(&s_job->refcount);
> 
> +       drm_job_get(s_job);
> 
>          spin_lock(&sched->job_list_lock);
> 
>          list_add_tail(&s_job->node, &sched->ring_mirror_list);
> 
>          drm_sched_start_timeout(sched);
> 
> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct 
> work_struct *work)
> 
>                   * drm_sched_cleanup_jobs. It will be reinserted back 
> after sched->thread
> 
>                   * is parked at which point it's safe.
> 
>                   */
> 
> -               list_del_init(&job->node);
> 
> +               drm_job_get(job);
> 
>                  spin_unlock(&sched->job_list_lock);
> 
>                  job->sched->ops->timedout_job(job);
> 
> -
> 
> +               drm_job_put(job);
> 
>                  /*
> 
>                   * Guilty job did complete and hence needs to be 
> manually removed
> 
>                   * See drm_sched_stop doc.
> 
>                   */
> 
>                  if (sched->free_guilty) {
> 
> -                       job->sched->ops->free_job(job);
> 
>                          sched->free_guilty = false;
> 
>                  }
> 
>          } else {
> 
> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler 
> *sched, struct drm_sched_job *bad)
> 
> -       /*
> 
> -        * Reinsert back the bad job here - now it's safe as
> 
> -        * drm_sched_get_cleanup_job cannot race against us and release the
> 
> -        * bad job at this point - we parked (waited for) any in progress
> 
> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not be 
> called
> 
> -        * now until the scheduler thread is unparked.
> 
> -        */
> 
> -       if (bad && bad->sched == sched)
> 
> -               /*
> 
> -                * Add at the head of the queue to reflect it was the 
> earliest
> 
> -                * job extracted.
> 
> -                */
> 
> -               list_add(&bad->node, &sched->ring_mirror_list);
> 
> -
> 
>          /*
> 
>           * Iterate the job list from later to  earlier one and either 
> deactive
> 
>           * their HW callbacks or remove them from mirror list if they 
> already
> 
> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
> 
>                                           kthread_should_stop());
> 
>                  if (cleanup_job) {
> 
> -                       sched->ops->free_job(cleanup_job);
> 
> +                       drm_job_put(cleanup_job);
> 
>                          /* queue timeout for next job */
> 
>                          drm_sched_start_timeout(sched);
> 
>                  }
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> 
> index 5a1f068af1c2..b80513eec90f 100644
> 
> --- a/include/drm/gpu_scheduler.h
> 
> +++ b/include/drm/gpu_scheduler.h
> 
> @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
> dma_fence *f);
> 
>    * to schedule the job.
> 
>    */
> 
> struct drm_sched_job {
> 
> +       struct kref                     refcount;
> 
>          struct spsc_node                queue_node;
> 
>          struct drm_gpu_scheduler        *sched;
> 
>          struct drm_sched_fence          *s_fence;
> 
> @@ -198,6 +199,7 @@ struct drm_sched_job {
> 
>          enum drm_sched_priority         s_priority;
> 
>          struct drm_sched_entity  *entity;
> 
>          struct dma_fence_cb             cb;
> 
> +
> 
> };
> 
> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> *Sent:* Friday, March 19, 2021 12:17 AM
> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König 
> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; 
> amd-gfx@lists.freedesktop.org; Koenig, Christian 
> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily 
> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso 
> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
> memleak
> 
> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
> 
>     [AMD Official Use Only - Internal Distribution Only]
> 
>     Hi, Andrey
> 
>     Let me summarize the background of this patch:
> 
>     In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
> 
>     It will submit first jobs of each ring and do guilty job re-check.
> 
>     At that point, We had to make sure each job is in the mirror list(or
>     re-inserted back already).
> 
>     But we found the current code never re-insert the job to mirror list
>     in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
> 
>     This not only will cause memleak of the bailing jobs. What’s more
>     important, the 1^st tdr thread can never iterate the bailing job and
>     set its guilty status to a correct status.
> 
>     Therefore, we had to re-insert the job(or even not delete node) for
>     bailing job.
> 
>     For the above V3 patch, the racing condition in my mind is:
> 
>     we cannot make sure all bailing jobs are finished before we do
>     amdgpu_device_recheck_guilty_jobs.
> 
> Yes,that race i missed - so you say that for 2nd, baling thread who 
> extracted the job, even if he reinsert it right away back after driver 
> callback return DRM_GPU_SCHED_STAT_BAILING, there is small time slot 
> where the job is not in mirror list and so the 1st TDR might miss it and 
> not find that  2nd job is the actual guilty job, right ? But, still this 
> job will get back into mirror list, and since it's really the bad job, 
> it will never signal completion and so on the next timeout cycle it will 
> be caught (of course there is a starvation scenario here if more TDRs 
> kick in and it bails out again but this is really unlikely).
> 
>     Based on this insight, I think we have two options to solve this issue:
> 
>      1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>         atomic variable)
>      2. Re-insert back bailing job, and meanwhile use semaphore in each
>         tdr thread to keep the sequence as expected and ensure each job
>         is in the mirror list when do resubmit step.
> 
>     For Option1, logic is simpler and we need only one global atomic
>     variable:
> 
>     What do you think about this plan?
> 
>     Option1 should look like the following logic:
> 
>     +static atomic_t in_reset;             //a global atomic var for
>     synchronization
> 
>     static void drm_sched_process_job(struct dma_fence *f, struct
>     dma_fence_cb *cb);
> 
>       /**
> 
>     @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>     work_struct *work)
> 
>                       * drm_sched_cleanup_jobs. It will be reinserted
>     back after sched->thread
> 
>                       * is parked at which point it's safe.
> 
>                       */
> 
>     +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>     delete node if it’s thead1,2,3,….
> 
>     +                       spin_unlock(&sched->job_list_lock);
> 
>     +                       drm_sched_start_timeout(sched);
> 
>     +                       return;
> 
>     +               }
> 
>     +
> 
>                      list_del_init(&job->node);
> 
>                      spin_unlock(&sched->job_list_lock);
> 
>     @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>     work_struct *work)
> 
>              spin_lock(&sched->job_list_lock);
> 
>              drm_sched_start_timeout(sched);
> 
>              spin_unlock(&sched->job_list_lock);
> 
>     +       atomic_set(&in_reset, 0); //reset in_reset when the first
>     thread finished tdr
> 
>     }
> 
> Technically looks like it should work as you don't access the job 
> pointer any longer and so no risk that if signaled it will be freed by 
> drm_sched_get_cleanup_job but,you can't just use one global variable an 
> by this bailing from TDR when different drivers run their TDR threads in 
> parallel, and even for amdgpu, if devices in different XGMI hives or 2 
> independent devices in non XGMI setup. There should be defined some kind 
> of GPU reset group structure on drm_scheduler level for which this 
> variable would be used.
> 
> P.S I wonder why we can't just ref-count the job so that even if 
> drm_sched_get_cleanup_job would delete it before we had a chance to stop 
> the scheduler thread, we wouldn't crash. This would avoid all the dance 
> with deletion and reinsertion.
> 
> Andrey
> 
>     Thanks,
> 
>     Jack
> 
>     *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>     <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>     Jack (Jian)
>     *Sent:* Wednesday, March 17, 2021 11:11 PM
>     *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>     <mailto:ckoenig.leichtzumerken@gmail.com>;
>     dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org
>     <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>     <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>     Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>     <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>     <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>     <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>     Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>     Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>     <mailto:Andrey.Grodzovsky@amd.com>
>     *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid memleak
> 
>     [AMD Official Use Only - Internal Distribution Only]
> 
>     [AMD Official Use Only - Internal Distribution Only]
> 
>     Hi,Andrey,
> 
>     Good catch,I will expore this corner case and give feedback soon~
> 
>     Best,
> 
>     Jack
> 
>     ------------------------------------------------------------------------
> 
>     *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>     <mailto:Andrey.Grodzovsky@amd.com>>
>     *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>     *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>     <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>     <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>     dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>
>     <dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>     <amd-gfx@lists.freedesktop.org
>     <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>     <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>     Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>     <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>     <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>     <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>     Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>     *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid memleak
> 
>     I actually have a race condition concern here - see bellow -
> 
>     On 2021-03-17 3:43 a.m., Christian König wrote:
>      > I was hoping Andrey would take a look since I'm really busy with
>     other
>      > work right now.
>      >
>      > Regards,
>      > Christian.
>      >
>      > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>      >> Hi, Andrey/Crhistian and Team,
>      >>
>      >> I didn't receive the reviewer's message from maintainers on
>     panfrost
>      >> driver for several days.
>      >> Due to this patch is urgent for my current working project.
>      >> Would you please help to give some review ideas?
>      >>
>      >> Many Thanks,
>      >> Jack
>      >> -----Original Message-----
>      >> From: Zhang, Jack (Jian)
>      >> Sent: Tuesday, March 16, 2021 3:20 PM
>      >> To: dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>     Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>      >> Vizoso <tomeu.vizoso@collabora.com
>     <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>     <steven.price@arm.com <mailto:steven.price@arm.com>>
>      >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid
>      >> memleak
>      >>
>      >> [AMD Public Use]
>      >>
>      >> Ping
>      >>
>      >> -----Original Message-----
>      >> From: Zhang, Jack (Jian)
>      >> Sent: Monday, March 15, 2021 1:24 PM
>      >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>      >> dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>     Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>      >> Vizoso <tomeu.vizoso@collabora.com
>     <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>     <steven.price@arm.com <mailto:steven.price@arm.com>>
>      >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid
>      >> memleak
>      >>
>      >> [AMD Public Use]
>      >>
>      >> Hi, Rob/Tomeu/Steven,
>      >>
>      >> Would you please help to review this patch for panfrost driver?
>      >>
>      >> Thanks,
>      >> Jack Zhang
>      >>
>      >> -----Original Message-----
>      >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>      >> Sent: Monday, March 15, 2021 1:21 PM
>      >> To: dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>      >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>     <mailto:Jack.Zhang1@amd.com>>
>      >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>     memleak
>      >>
>      >> re-insert Bailing jobs to avoid memory leak.
>      >>
>      >> V2: move re-insert step to drm/scheduler logic
>      >> V3: add panfrost's return value for bailing jobs in case it hits
>     the
>      >> memleak issue.
>      >>
>      >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>     <mailto:Jack.Zhang1@amd.com>>
>      >> ---
>      >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>      >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>      >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>      >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>      >>   include/drm/gpu_scheduler.h                | 1 +
>      >>   5 files changed, 19 insertions(+), 6 deletions(-)
>      >>
>      >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> index 79b9cc73763f..86463b0f936e 100644
>      >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>      >> amdgpu_device *adev,
>      >>                       job ? job->base.id : -1);
>      >>             /* even we skipped this reset, still need to set the
>     job
>      >> to guilty */
>      >> -        if (job)
>      >> +        if (job) {
>      >>               drm_sched_increase_karma(&job->base);
>      >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>      >> +        }
>      >>           goto skip_recovery;
>      >>       }
>      >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> index 759b34799221..41390bdacd9e 100644
>      >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>      >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>      >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>      >>       struct amdgpu_task_info ti;
>      >>       struct amdgpu_device *adev = ring->adev;
>      >> +    int ret;
>      >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>      >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>      >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>      >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>      >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>      >> -        amdgpu_device_gpu_recover(ring->adev, job);
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>      >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>      >> +            return DRM_GPU_SCHED_STAT_BAILING;
>      >> +        else
>      >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>      >>       } else {
>      >>           drm_sched_suspend_timeout(&ring->sched);
>      >>           if (amdgpu_sriov_vf(adev))
>      >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> index 6003cfeb1322..e2cb4f32dae1 100644
>      >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>      >> panfrost_job_timedout(struct drm_sched_job
>      >>        * spurious. Bail out.
>      >>        */
>      >>       if (dma_fence_is_signaled(job->done_fence))
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        return DRM_GPU_SCHED_STAT_BAILING;
>      >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>      >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>      >>           js,
>      >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>      >> panfrost_job_timedout(struct drm_sched_job
>      >>         /* Scheduler is already stopped, nothing to do. */
>      >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>     sched_job))
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        return DRM_GPU_SCHED_STAT_BAILING;
>      >>         /* Schedule a reset if there's no reset in progress. */
>      >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>      >> a/drivers/gpu/drm/scheduler/sched_main.c
>      >> b/drivers/gpu/drm/scheduler/sched_main.c
>      >> index 92d8de24d0a1..a44f621fb5c4 100644
>      >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>      >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>      >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>      >> work_struct *work)  {
>      >>       struct drm_gpu_scheduler *sched;
>      >>       struct drm_sched_job *job;
>      >> +    int ret;
>      >>         sched = container_of(work, struct drm_gpu_scheduler,
>      >> work_tdr.work);
>      >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>      >> work_struct *work)
>      >>           list_del_init(&job->list);
>      >>           spin_unlock(&sched->job_list_lock);
>      >>   -        job->sched->ops->timedout_job(job);
>      >> +        ret = job->sched->ops->timedout_job(job);
>      >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>      >> +            spin_lock(&sched->job_list_lock);
>      >> +            list_add(&job->node, &sched->ring_mirror_list);
>      >> +            spin_unlock(&sched->job_list_lock);
>      >> +        }
> 
> 
>     At this point we don't hold GPU reset locks anymore, and so we could
>     be racing against another TDR thread from another scheduler ring of
>     same
>     device
>     or another XGMI hive member. The other thread might be in the middle of
>     luckless
>     iteration of mirror list (drm_sched_stop, drm_sched_start and
>     drm_sched_resubmit)
>     and so locking job_list_lock will not help. Looks like it's required to
>     take all GPU rest locks
>     here.
> 
>     Andrey
> 
> 
>      >>           /*
>      >>            * Guilty job did complete and hence needs to be manually
>      >> removed
>      >>            * See drm_sched_stop doc.
>      >> diff --git a/include/drm/gpu_scheduler.h
>      >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>     100644
>      >> --- a/include/drm/gpu_scheduler.h
>      >> +++ b/include/drm/gpu_scheduler.h
>      >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>      >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>      >>       DRM_GPU_SCHED_STAT_NOMINAL,
>      >>       DRM_GPU_SCHED_STAT_ENODEV,
>      >> +    DRM_GPU_SCHED_STAT_BAILING,
>      >>   };
>      >>     /**
>      >> --
>      >> 2.25.1
>      >> _______________________________________________
>      >> amd-gfx mailing list
>      >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>      >>
>     https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0
>     <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=0>
> 
>      >>
>      >
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-22 15:29   ` Steven Price
@ 2021-03-26  2:04     ` Zhang, Jack (Jian)
  2021-03-26  9:07       ` Steven Price
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-26  2:04 UTC (permalink / raw)
  To: Steven Price, dri-devel, amd-gfx, Koenig, Christian, Grodzovsky,
	Andrey, Liu, Monk, Deng, Emily, Rob Herring, Tomeu Vizoso

[AMD Official Use Only - Internal Distribution Only]

Hi, Steve,

Thank you for your detailed comments.

But currently the patch is not finalized.
We found some potential race condition even with this patch. The solution is under discussion and hopefully we could find an ideal one.
After that, I will start to consider other drm-driver if it will influence other drivers(except for amdgpu).

Best,
Jack

-----Original Message-----
From: Steven Price <steven.price@arm.com>
Sent: Monday, March 22, 2021 11:29 PM
To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:
> [AMD Public Use]
>
> Hi, Rob/Tomeu/Steven,
>
> Would you please help to review this patch for panfrost driver?
>
> Thanks,
> Jack Zhang
>
> -----Original Message-----
> From: Jack Zhang <Jack.Zhang1@amd.com>
> Sent: Monday, March 15, 2021 1:21 PM
> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
> <Emily.Deng@amd.com>
> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> memleak
>
> re-insert Bailing jobs to avoid memory leak.
>
> V2: move re-insert step to drm/scheduler logic
> V3: add panfrost's return value for bailing jobs in case it hits the
> memleak issue.

This commit message could do with some work - it's really hard to decipher what the actual problem you're solving is.

>
> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>   include/drm/gpu_scheduler.h                | 1 +
>   5 files changed, 19 insertions(+), 6 deletions(-)
>
[...]
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
> b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 6003cfeb1322..e2cb4f32dae1 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>    * spurious. Bail out.
>    */
>   if (dma_fence_is_signaled(job->done_fence))
> -return DRM_GPU_SCHED_STAT_NOMINAL;
> +return DRM_GPU_SCHED_STAT_BAILING;
>
>   dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>   js,
> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
> panfrost_job_timedout(struct drm_sched_job
>
>   /* Scheduler is already stopped, nothing to do. */
>   if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
> -return DRM_GPU_SCHED_STAT_NOMINAL;
> +return DRM_GPU_SCHED_STAT_BAILING;
>
>   /* Schedule a reset if there's no reset in progress. */
>   if (!atomic_xchg(&pfdev->reset.pending, 1))

This looks correct to me - in these two cases drm_sched_stop() is not called on the sched_job, so it looks like currently the job will be leaked.

> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..a44f621fb5c4 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   {
>   struct drm_gpu_scheduler *sched;
>   struct drm_sched_job *job;
> +int ret;
>
>   sched = container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
>
> @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   list_del_init(&job->list);
>   spin_unlock(&sched->job_list_lock);
>
> -job->sched->ops->timedout_job(job);
> +ret = job->sched->ops->timedout_job(job);
>
> +if (ret == DRM_GPU_SCHED_STAT_BAILING) {
> +spin_lock(&sched->job_list_lock);
> +list_add(&job->node, &sched->ring_mirror_list);
> +spin_unlock(&sched->job_list_lock);
> +}

I think we could really do with a comment somewhere explaining what "bailing" means in this context. For the Panfrost case we have two cases:

  * The GPU job actually finished while the timeout code was running (done_fence is signalled).

  * The GPU is already in the process of being reset (Panfrost has multiple queues, so mostly like a bad job in another queue).

I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs back to the list. For the first case above clearly the job could just be freed (it's complete). The second case is more interesting and Panfrost currently doesn't handle this well. In theory the driver could try to rescue the job ('soft stop' in Mali language) so that it could be resubmitted. Panfrost doesn't currently support that, so attempting to resubmit the job is almost certainly going to fail.

It's on my TODO list to look at improving Panfrost in this regard, but sadly still quite far down.

Steve

>   /*
>    * Guilty job did complete and hence needs to be manually removed
>    * See drm_sched_stop doc.
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 4ea8606d91fe..8093ac2427ef 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>   DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>   DRM_GPU_SCHED_STAT_NOMINAL,
>   DRM_GPU_SCHED_STAT_ENODEV,
> +DRM_GPU_SCHED_STAT_BAILING,
>   };
>
>   /**
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-25 16:32                   ` Andrey Grodzovsky
@ 2021-03-26  2:23                     ` Zhang, Jack (Jian)
  2021-03-26  9:05                       ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang, Jack (Jian) @ 2021-03-26  2:23 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Koenig, Christian, Liu, Monk, Deng,  Emily, Rob Herring,
	Tomeu Vizoso, Steven Price

[AMD Official Use Only - Internal Distribution Only]

Hi, Andrey,

>>how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before
Good point, I missed that place. Will cover that in my next patch.

>>Also sched->free_guilty seems useless with the new approach.
Yes, I agree.

>>Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
I am not quite sure about that for now, let me think about this topic today.

Hi, Christian,
should I add a fence and get/put to that fence rather than using an explicit refcount?
And another concerns?

Thanks,
Jack

-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Friday, March 26, 2021 12:32 AM
To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...

But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.

Christian - can you chime in here ?

Andrey

On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
>
> Hi, Andrey
>
> Thank you for your good opinions.
>
> I literally agree with you that the refcount could solve the
> get_clean_up_up cocurrent job gracefully, and no need to re-insert the
>
> job back anymore.
>
> I quickly made a draft for this idea as follows:
>
> How do you like it? I will start implement to it after I got your
> acknowledge.
>
> Thanks,
>
> Jack
>
> +void drm_job_get(struct drm_sched_job *s_job)
>
> +{
>
> +       kref_get(&s_job->refcount);
>
> +}
>
> +
>
> +void drm_job_do_release(struct kref *ref)
>
> +{
>
> +       struct drm_sched_job *s_job;
>
> +       struct drm_gpu_scheduler *sched;
>
> +
>
> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>
> +       sched = s_job->sched;
>
> +       sched->ops->free_job(s_job);
>
> +}
>
> +
>
> +void drm_job_put(struct drm_sched_job *s_job)
>
> +{
>
> +       kref_put(&s_job->refcount, drm_job_do_release);
>
> +}
>
> +
>
> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>
> {
>
>          struct drm_gpu_scheduler *sched = s_job->sched;
>
> +       kref_init(&s_job->refcount);
>
> +       drm_job_get(s_job);
>
>          spin_lock(&sched->job_list_lock);
>
>          list_add_tail(&s_job->node, &sched->ring_mirror_list);
>
>          drm_sched_start_timeout(sched);
>
> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
> work_struct *work)
>
>                   * drm_sched_cleanup_jobs. It will be reinserted back
> after sched->thread
>
>                   * is parked at which point it's safe.
>
>                   */
>
> -               list_del_init(&job->node);
>
> +               drm_job_get(job);
>
>                  spin_unlock(&sched->job_list_lock);
>
>                  job->sched->ops->timedout_job(job);
>
> -
>
> +               drm_job_put(job);
>
>                  /*
>
>                   * Guilty job did complete and hence needs to be
> manually removed
>
>                   * See drm_sched_stop doc.
>
>                   */
>
>                  if (sched->free_guilty) {
>
> -                       job->sched->ops->free_job(job);
>
>                          sched->free_guilty = false;
>
>                  }
>
>          } else {
>
> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
> *sched, struct drm_sched_job *bad)
>
> -       /*
>
> -        * Reinsert back the bad job here - now it's safe as
>
> -        * drm_sched_get_cleanup_job cannot race against us and
> release the
>
> -        * bad job at this point - we parked (waited for) any in
> progress
>
> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
> be called
>
> -        * now until the scheduler thread is unparked.
>
> -        */
>
> -       if (bad && bad->sched == sched)
>
> -               /*
>
> -                * Add at the head of the queue to reflect it was the
> earliest
>
> -                * job extracted.
>
> -                */
>
> -               list_add(&bad->node, &sched->ring_mirror_list);
>
> -
>
>          /*
>
>           * Iterate the job list from later to  earlier one and either
> deactive
>
>           * their HW callbacks or remove them from mirror list if they
> already
>
> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>
>                                           kthread_should_stop());
>
>                  if (cleanup_job) {
>
> -                       sched->ops->free_job(cleanup_job);
>
> +                       drm_job_put(cleanup_job);
>
>                          /* queue timeout for next job */
>
>                          drm_sched_start_timeout(sched);
>
>                  }
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>
> index 5a1f068af1c2..b80513eec90f 100644
>
> --- a/include/drm/gpu_scheduler.h
>
> +++ b/include/drm/gpu_scheduler.h
>
> @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
> dma_fence *f);
>
>    * to schedule the job.
>
>    */
>
> struct drm_sched_job {
>
> +       struct kref                     refcount;
>
>          struct spsc_node                queue_node;
>
>          struct drm_gpu_scheduler        *sched;
>
>          struct drm_sched_fence          *s_fence;
>
> @@ -198,6 +199,7 @@ struct drm_sched_job {
>
>          enum drm_sched_priority         s_priority;
>
>          struct drm_sched_entity  *entity;
>
>          struct dma_fence_cb             cb;
>
> +
>
> };
>
> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> *Sent:* Friday, March 19, 2021 12:17 AM
> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org; Koenig, Christian
> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> memleak
>
> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>
>     [AMD Official Use Only - Internal Distribution Only]
>
>     Hi, Andrey
>
>     Let me summarize the background of this patch:
>
>     In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>
>     It will submit first jobs of each ring and do guilty job re-check.
>
>     At that point, We had to make sure each job is in the mirror list(or
>     re-inserted back already).
>
>     But we found the current code never re-insert the job to mirror list
>     in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>
>     This not only will cause memleak of the bailing jobs. What’s more
>     important, the 1^st tdr thread can never iterate the bailing job and
>     set its guilty status to a correct status.
>
>     Therefore, we had to re-insert the job(or even not delete node) for
>     bailing job.
>
>     For the above V3 patch, the racing condition in my mind is:
>
>     we cannot make sure all bailing jobs are finished before we do
>     amdgpu_device_recheck_guilty_jobs.
>
> Yes,that race i missed - so you say that for 2nd, baling thread who
> extracted the job, even if he reinsert it right away back after driver
> callback return DRM_GPU_SCHED_STAT_BAILING, there is small time slot
> where the job is not in mirror list and so the 1st TDR might miss it
> and not find that  2nd job is the actual guilty job, right ? But,
> still this job will get back into mirror list, and since it's really
> the bad job, it will never signal completion and so on the next
> timeout cycle it will be caught (of course there is a starvation
> scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>
>     Based on this insight, I think we have two options to solve this issue:
>
>      1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>         atomic variable)
>      2. Re-insert back bailing job, and meanwhile use semaphore in each
>         tdr thread to keep the sequence as expected and ensure each job
>         is in the mirror list when do resubmit step.
>
>     For Option1, logic is simpler and we need only one global atomic
>     variable:
>
>     What do you think about this plan?
>
>     Option1 should look like the following logic:
>
>     +static atomic_t in_reset;             //a global atomic var for
>     synchronization
>
>     static void drm_sched_process_job(struct dma_fence *f, struct
>     dma_fence_cb *cb);
>
>       /**
>
>     @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>     work_struct *work)
>
>                       * drm_sched_cleanup_jobs. It will be reinserted
>     back after sched->thread
>
>                       * is parked at which point it's safe.
>
>                       */
>
>     +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>     delete node if it’s thead1,2,3,….
>
>     +                       spin_unlock(&sched->job_list_lock);
>
>     +                       drm_sched_start_timeout(sched);
>
>     +                       return;
>
>     +               }
>
>     +
>
>                      list_del_init(&job->node);
>
>                      spin_unlock(&sched->job_list_lock);
>
>     @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>     work_struct *work)
>
>              spin_lock(&sched->job_list_lock);
>
>              drm_sched_start_timeout(sched);
>
>              spin_unlock(&sched->job_list_lock);
>
>     +       atomic_set(&in_reset, 0); //reset in_reset when the first
>     thread finished tdr
>
>     }
>
> Technically looks like it should work as you don't access the job
> pointer any longer and so no risk that if signaled it will be freed by
> drm_sched_get_cleanup_job but,you can't just use one global variable
> an by this bailing from TDR when different drivers run their TDR
> threads in parallel, and even for amdgpu, if devices in different XGMI
> hives or 2 independent devices in non XGMI setup. There should be
> defined some kind of GPU reset group structure on drm_scheduler level
> for which this variable would be used.
>
> P.S I wonder why we can't just ref-count the job so that even if
> drm_sched_get_cleanup_job would delete it before we had a chance to
> stop the scheduler thread, we wouldn't crash. This would avoid all the
> dance with deletion and reinsertion.
>
> Andrey
>
>     Thanks,
>
>     Jack
>
>     *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>     <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>     Jack (Jian)
>     *Sent:* Wednesday, March 17, 2021 11:11 PM
>     *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>     <mailto:ckoenig.leichtzumerken@gmail.com>;
>     dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org
>     <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>     <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>     Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>     <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>     <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>     <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>     Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>     Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>     <mailto:Andrey.Grodzovsky@amd.com>
>     *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid memleak
>
>     [AMD Official Use Only - Internal Distribution Only]
>
>     [AMD Official Use Only - Internal Distribution Only]
>
>     Hi,Andrey,
>
>     Good catch,I will expore this corner case and give feedback soon~
>
>     Best,
>
>     Jack
>
>
> ----------------------------------------------------------------------
> --
>
>     *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>     <mailto:Andrey.Grodzovsky@amd.com>>
>     *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>     *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>     <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>     <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>     dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>
>     <dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>     <amd-gfx@lists.freedesktop.org
>     <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>     <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>     Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>     <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>     <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>     <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>     Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>     *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid memleak
>
>     I actually have a race condition concern here - see bellow -
>
>     On 2021-03-17 3:43 a.m., Christian König wrote:
>      > I was hoping Andrey would take a look since I'm really busy with
>     other
>      > work right now.
>      >
>      > Regards,
>      > Christian.
>      >
>      > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>      >> Hi, Andrey/Crhistian and Team,
>      >>
>      >> I didn't receive the reviewer's message from maintainers on
>     panfrost
>      >> driver for several days.
>      >> Due to this patch is urgent for my current working project.
>      >> Would you please help to give some review ideas?
>      >>
>      >> Many Thanks,
>      >> Jack
>      >> -----Original Message-----
>      >> From: Zhang, Jack (Jian)
>      >> Sent: Tuesday, March 16, 2021 3:20 PM
>      >> To: dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>     Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>      >> Vizoso <tomeu.vizoso@collabora.com
>     <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>     <steven.price@arm.com <mailto:steven.price@arm.com>>
>      >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid
>      >> memleak
>      >>
>      >> [AMD Public Use]
>      >>
>      >> Ping
>      >>
>      >> -----Original Message-----
>      >> From: Zhang, Jack (Jian)
>      >> Sent: Monday, March 15, 2021 1:24 PM
>      >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>      >> dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>     Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>      >> Vizoso <tomeu.vizoso@collabora.com
>     <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>     <steven.price@arm.com <mailto:steven.price@arm.com>>
>      >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>     avoid
>      >> memleak
>      >>
>      >> [AMD Public Use]
>      >>
>      >> Hi, Rob/Tomeu/Steven,
>      >>
>      >> Would you please help to review this patch for panfrost driver?
>      >>
>      >> Thanks,
>      >> Jack Zhang
>      >>
>      >> -----Original Message-----
>      >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>      >> Sent: Monday, March 15, 2021 1:21 PM
>      >> To: dri-devel@lists.freedesktop.org
>     <mailto:dri-devel@lists.freedesktop.org>;
>     amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>      >> Koenig, Christian <Christian.Koenig@amd.com
>     <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>      >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>     Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>      >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>      >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>     <mailto:Jack.Zhang1@amd.com>>
>      >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>     memleak
>      >>
>      >> re-insert Bailing jobs to avoid memory leak.
>      >>
>      >> V2: move re-insert step to drm/scheduler logic
>      >> V3: add panfrost's return value for bailing jobs in case it hits
>     the
>      >> memleak issue.
>      >>
>      >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>     <mailto:Jack.Zhang1@amd.com>>
>      >> ---
>      >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>      >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>      >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>      >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>      >>   include/drm/gpu_scheduler.h                | 1 +
>      >>   5 files changed, 19 insertions(+), 6 deletions(-)
>      >>
>      >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> index 79b9cc73763f..86463b0f936e 100644
>      >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>      >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>      >> amdgpu_device *adev,
>      >>                       job ? job->base.id : -1);
>      >>             /* even we skipped this reset, still need to set the
>     job
>      >> to guilty */
>      >> -        if (job)
>      >> +        if (job) {
>      >>               drm_sched_increase_karma(&job->base);
>      >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>      >> +        }
>      >>           goto skip_recovery;
>      >>       }
>      >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> index 759b34799221..41390bdacd9e 100644
>      >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>      >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>      >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>      >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>      >>       struct amdgpu_task_info ti;
>      >>       struct amdgpu_device *adev = ring->adev;
>      >> +    int ret;
>      >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>      >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>      >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>      >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>      >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>      >> -        amdgpu_device_gpu_recover(ring->adev, job);
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>      >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>      >> +            return DRM_GPU_SCHED_STAT_BAILING;
>      >> +        else
>      >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>      >>       } else {
>      >>           drm_sched_suspend_timeout(&ring->sched);
>      >>           if (amdgpu_sriov_vf(adev))
>      >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> index 6003cfeb1322..e2cb4f32dae1 100644
>      >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>      >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>      >> panfrost_job_timedout(struct drm_sched_job
>      >>        * spurious. Bail out.
>      >>        */
>      >>       if (dma_fence_is_signaled(job->done_fence))
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        return DRM_GPU_SCHED_STAT_BAILING;
>      >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>      >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>      >>           js,
>      >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>      >> panfrost_job_timedout(struct drm_sched_job
>      >>         /* Scheduler is already stopped, nothing to do. */
>      >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>     sched_job))
>      >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>      >> +        return DRM_GPU_SCHED_STAT_BAILING;
>      >>         /* Schedule a reset if there's no reset in progress. */
>      >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>      >> a/drivers/gpu/drm/scheduler/sched_main.c
>      >> b/drivers/gpu/drm/scheduler/sched_main.c
>      >> index 92d8de24d0a1..a44f621fb5c4 100644
>      >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>      >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>      >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>      >> work_struct *work)  {
>      >>       struct drm_gpu_scheduler *sched;
>      >>       struct drm_sched_job *job;
>      >> +    int ret;
>      >>         sched = container_of(work, struct drm_gpu_scheduler,
>      >> work_tdr.work);
>      >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>      >> work_struct *work)
>      >>           list_del_init(&job->list);
>      >>           spin_unlock(&sched->job_list_lock);
>      >>   -        job->sched->ops->timedout_job(job);
>      >> +        ret = job->sched->ops->timedout_job(job);
>      >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>      >> +            spin_lock(&sched->job_list_lock);
>      >> +            list_add(&job->node, &sched->ring_mirror_list);
>      >> +            spin_unlock(&sched->job_list_lock);
>      >> +        }
>
>
>     At this point we don't hold GPU reset locks anymore, and so we could
>     be racing against another TDR thread from another scheduler ring of
>     same
>     device
>     or another XGMI hive member. The other thread might be in the middle of
>     luckless
>     iteration of mirror list (drm_sched_stop, drm_sched_start and
>     drm_sched_resubmit)
>     and so locking job_list_lock will not help. Looks like it's required to
>     take all GPU rest locks
>     here.
>
>     Andrey
>
>
>      >>           /*
>      >>            * Guilty job did complete and hence needs to be manually
>      >> removed
>      >>            * See drm_sched_stop doc.
>      >> diff --git a/include/drm/gpu_scheduler.h
>      >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>     100644
>      >> --- a/include/drm/gpu_scheduler.h
>      >> +++ b/include/drm/gpu_scheduler.h
>      >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>      >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>      >>       DRM_GPU_SCHED_STAT_NOMINAL,
>      >>       DRM_GPU_SCHED_STAT_ENODEV,
>      >> +    DRM_GPU_SCHED_STAT_BAILING,
>      >>   };
>      >>     /**
>      >> --
>      >> 2.25.1
>      >> _______________________________________________
>      >> amd-gfx mailing list
>      >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>      >>
>     https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0
>
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e60
> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2
> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=
> 0>
>
>      >>
>      >
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-26  2:23                     ` Zhang, Jack (Jian)
@ 2021-03-26  9:05                       ` Christian König
  2021-03-26 11:21                         ` 回复: " Liu, Monk
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2021-03-26  9:05 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Liu, Monk, Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price

Hi guys,

Am 26.03.21 um 03:23 schrieb Zhang, Jack (Jian):
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Andrey,
>
>>> how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before
> Good point, I missed that place. Will cover that in my next patch.
>
>>> Also sched->free_guilty seems useless with the new approach.
> Yes, I agree.
>
>>> Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
> I am not quite sure about that for now, let me think about this topic today.
>
> Hi, Christian,
> should I add a fence and get/put to that fence rather than using an explicit refcount?
> And another concerns?

well let me re-iterate:

For the scheduler the job is just a temporary data structure used for 
scheduling the IBs to the hardware.

While pushing the job to the hardware we get a fence structure in return 
which represents the IBs executing on the hardware.

Unfortunately we have applied a design where the job structure is rather 
used for re-submitting the jobs to the hardware after a GPU reset and 
karma handling etc etc...

All that shouldn't have been pushed into the scheduler into the first 
place and we should now work on getting this cleaned up rather than 
making it an even bigger mess by applying halve backed solutions.

So in my opinion adding a reference count to the job is going into the 
completely wrong directly. What we should rather do is to fix the 
incorrect design decision to use jobs as vehicle in the scheduler for 
reset handling.

To fix this I suggest the following approach:
1. We add a pointer from the drm_sched_fence back to the drm_sched_job.
2. Instead of keeping the job around in the scheduler we keep the fence 
around. For this I suggest to replace the pending_list with a ring buffer.
3. The timedout_job callback is replaced with a timeout_fence callback.
4. The free_job callback is completed dropped. Job lifetime is now 
handled in the driver, not the scheduler.

Regards,
Christian.

>
> Thanks,
> Jack
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Friday, March 26, 2021 12:32 AM
> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>
> But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.
>
> Christian - can you chime in here ?
>
> Andrey
>
> On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> Hi, Andrey
>>
>> Thank you for your good opinions.
>>
>> I literally agree with you that the refcount could solve the
>> get_clean_up_up cocurrent job gracefully, and no need to re-insert the
>>
>> job back anymore.
>>
>> I quickly made a draft for this idea as follows:
>>
>> How do you like it? I will start implement to it after I got your
>> acknowledge.
>>
>> Thanks,
>>
>> Jack
>>
>> +void drm_job_get(struct drm_sched_job *s_job)
>>
>> +{
>>
>> +       kref_get(&s_job->refcount);
>>
>> +}
>>
>> +
>>
>> +void drm_job_do_release(struct kref *ref)
>>
>> +{
>>
>> +       struct drm_sched_job *s_job;
>>
>> +       struct drm_gpu_scheduler *sched;
>>
>> +
>>
>> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>>
>> +       sched = s_job->sched;
>>
>> +       sched->ops->free_job(s_job);
>>
>> +}
>>
>> +
>>
>> +void drm_job_put(struct drm_sched_job *s_job)
>>
>> +{
>>
>> +       kref_put(&s_job->refcount, drm_job_do_release);
>>
>> +}
>>
>> +
>>
>> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>
>> {
>>
>>           struct drm_gpu_scheduler *sched = s_job->sched;
>>
>> +       kref_init(&s_job->refcount);
>>
>> +       drm_job_get(s_job);
>>
>>           spin_lock(&sched->job_list_lock);
>>
>>           list_add_tail(&s_job->node, &sched->ring_mirror_list);
>>
>>           drm_sched_start_timeout(sched);
>>
>> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
>>
>>                    * drm_sched_cleanup_jobs. It will be reinserted back
>> after sched->thread
>>
>>                    * is parked at which point it's safe.
>>
>>                    */
>>
>> -               list_del_init(&job->node);
>>
>> +               drm_job_get(job);
>>
>>                   spin_unlock(&sched->job_list_lock);
>>
>>                   job->sched->ops->timedout_job(job);
>>
>> -
>>
>> +               drm_job_put(job);
>>
>>                   /*
>>
>>                    * Guilty job did complete and hence needs to be
>> manually removed
>>
>>                    * See drm_sched_stop doc.
>>
>>                    */
>>
>>                   if (sched->free_guilty) {
>>
>> -                       job->sched->ops->free_job(job);
>>
>>                           sched->free_guilty = false;
>>
>>                   }
>>
>>           } else {
>>
>> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
>> *sched, struct drm_sched_job *bad)
>>
>> -       /*
>>
>> -        * Reinsert back the bad job here - now it's safe as
>>
>> -        * drm_sched_get_cleanup_job cannot race against us and
>> release the
>>
>> -        * bad job at this point - we parked (waited for) any in
>> progress
>>
>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
>> be called
>>
>> -        * now until the scheduler thread is unparked.
>>
>> -        */
>>
>> -       if (bad && bad->sched == sched)
>>
>> -               /*
>>
>> -                * Add at the head of the queue to reflect it was the
>> earliest
>>
>> -                * job extracted.
>>
>> -                */
>>
>> -               list_add(&bad->node, &sched->ring_mirror_list);
>>
>> -
>>
>>           /*
>>
>>            * Iterate the job list from later to  earlier one and either
>> deactive
>>
>>            * their HW callbacks or remove them from mirror list if they
>> already
>>
>> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>>
>>                                            kthread_should_stop());
>>
>>                   if (cleanup_job) {
>>
>> -                       sched->ops->free_job(cleanup_job);
>>
>> +                       drm_job_put(cleanup_job);
>>
>>                           /* queue timeout for next job */
>>
>>                           drm_sched_start_timeout(sched);
>>
>>                   }
>>
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>
>> index 5a1f068af1c2..b80513eec90f 100644
>>
>> --- a/include/drm/gpu_scheduler.h
>>
>> +++ b/include/drm/gpu_scheduler.h
>>
>> @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
>> dma_fence *f);
>>
>>     * to schedule the job.
>>
>>     */
>>
>> struct drm_sched_job {
>>
>> +       struct kref                     refcount;
>>
>>           struct spsc_node                queue_node;
>>
>>           struct drm_gpu_scheduler        *sched;
>>
>>           struct drm_sched_fence          *s_fence;
>>
>> @@ -198,6 +199,7 @@ struct drm_sched_job {
>>
>>           enum drm_sched_priority         s_priority;
>>
>>           struct drm_sched_entity  *entity;
>>
>>           struct dma_fence_cb             cb;
>>
>> +
>>
>> };
>>
>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> *Sent:* Friday, March 19, 2021 12:17 AM
>> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      Hi, Andrey
>>
>>      Let me summarize the background of this patch:
>>
>>      In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>>
>>      It will submit first jobs of each ring and do guilty job re-check.
>>
>>      At that point, We had to make sure each job is in the mirror list(or
>>      re-inserted back already).
>>
>>      But we found the current code never re-insert the job to mirror list
>>      in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>>
>>      This not only will cause memleak of the bailing jobs. What’s more
>>      important, the 1^st tdr thread can never iterate the bailing job and
>>      set its guilty status to a correct status.
>>
>>      Therefore, we had to re-insert the job(or even not delete node) for
>>      bailing job.
>>
>>      For the above V3 patch, the racing condition in my mind is:
>>
>>      we cannot make sure all bailing jobs are finished before we do
>>      amdgpu_device_recheck_guilty_jobs.
>>
>> Yes,that race i missed - so you say that for 2nd, baling thread who
>> extracted the job, even if he reinsert it right away back after driver
>> callback return DRM_GPU_SCHED_STAT_BAILING, there is small time slot
>> where the job is not in mirror list and so the 1st TDR might miss it
>> and not find that  2nd job is the actual guilty job, right ? But,
>> still this job will get back into mirror list, and since it's really
>> the bad job, it will never signal completion and so on the next
>> timeout cycle it will be caught (of course there is a starvation
>> scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>>
>>      Based on this insight, I think we have two options to solve this issue:
>>
>>       1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>>          atomic variable)
>>       2. Re-insert back bailing job, and meanwhile use semaphore in each
>>          tdr thread to keep the sequence as expected and ensure each job
>>          is in the mirror list when do resubmit step.
>>
>>      For Option1, logic is simpler and we need only one global atomic
>>      variable:
>>
>>      What do you think about this plan?
>>
>>      Option1 should look like the following logic:
>>
>>      +static atomic_t in_reset;             //a global atomic var for
>>      synchronization
>>
>>      static void drm_sched_process_job(struct dma_fence *f, struct
>>      dma_fence_cb *cb);
>>
>>        /**
>>
>>      @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>>      work_struct *work)
>>
>>                        * drm_sched_cleanup_jobs. It will be reinserted
>>      back after sched->thread
>>
>>                        * is parked at which point it's safe.
>>
>>                        */
>>
>>      +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>>      delete node if it’s thead1,2,3,….
>>
>>      +                       spin_unlock(&sched->job_list_lock);
>>
>>      +                       drm_sched_start_timeout(sched);
>>
>>      +                       return;
>>
>>      +               }
>>
>>      +
>>
>>                       list_del_init(&job->node);
>>
>>                       spin_unlock(&sched->job_list_lock);
>>
>>      @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>>      work_struct *work)
>>
>>               spin_lock(&sched->job_list_lock);
>>
>>               drm_sched_start_timeout(sched);
>>
>>               spin_unlock(&sched->job_list_lock);
>>
>>      +       atomic_set(&in_reset, 0); //reset in_reset when the first
>>      thread finished tdr
>>
>>      }
>>
>> Technically looks like it should work as you don't access the job
>> pointer any longer and so no risk that if signaled it will be freed by
>> drm_sched_get_cleanup_job but,you can't just use one global variable
>> an by this bailing from TDR when different drivers run their TDR
>> threads in parallel, and even for amdgpu, if devices in different XGMI
>> hives or 2 independent devices in non XGMI setup. There should be
>> defined some kind of GPU reset group structure on drm_scheduler level
>> for which this variable would be used.
>>
>> P.S I wonder why we can't just ref-count the job so that even if
>> drm_sched_get_cleanup_job would delete it before we had a chance to
>> stop the scheduler thread, we wouldn't crash. This would avoid all the
>> dance with deletion and reinsertion.
>>
>> Andrey
>>
>>      Thanks,
>>
>>      Jack
>>
>>      *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>>      <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>>      Jack (Jian)
>>      *Sent:* Wednesday, March 17, 2021 11:11 PM
>>      *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>>      <mailto:ckoenig.leichtzumerken@gmail.com>;
>>      dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org
>>      <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>>      <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>>      Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>>      <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>>      <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>>      <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>>      Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>>      Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>      <mailto:Andrey.Grodzovsky@amd.com>
>>      *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid memleak
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      Hi,Andrey,
>>
>>      Good catch,I will expore this corner case and give feedback soon~
>>
>>      Best,
>>
>>      Jack
>>
>>
>> ----------------------------------------------------------------------
>> --
>>
>>      *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>>      <mailto:Andrey.Grodzovsky@amd.com>>
>>      *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>>      *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>>      <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>>      <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>      dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>
>>      <dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>      <amd-gfx@lists.freedesktop.org
>>      <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>>      <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>>      Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>>      <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>>      <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>>      <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>>      Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>>      *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid memleak
>>
>>      I actually have a race condition concern here - see bellow -
>>
>>      On 2021-03-17 3:43 a.m., Christian König wrote:
>>       > I was hoping Andrey would take a look since I'm really busy with
>>      other
>>       > work right now.
>>       >
>>       > Regards,
>>       > Christian.
>>       >
>>       > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>>       >> Hi, Andrey/Crhistian and Team,
>>       >>
>>       >> I didn't receive the reviewer's message from maintainers on
>>      panfrost
>>       >> driver for several days.
>>       >> Due to this patch is urgent for my current working project.
>>       >> Would you please help to give some review ideas?
>>       >>
>>       >> Many Thanks,
>>       >> Jack
>>       >> -----Original Message-----
>>       >> From: Zhang, Jack (Jian)
>>       >> Sent: Tuesday, March 16, 2021 3:20 PM
>>       >> To: dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>      Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>       >> Vizoso <tomeu.vizoso@collabora.com
>>      <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>      <steven.price@arm.com <mailto:steven.price@arm.com>>
>>       >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid
>>       >> memleak
>>       >>
>>       >> [AMD Public Use]
>>       >>
>>       >> Ping
>>       >>
>>       >> -----Original Message-----
>>       >> From: Zhang, Jack (Jian)
>>       >> Sent: Monday, March 15, 2021 1:24 PM
>>       >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>       >> dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>      Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>       >> Vizoso <tomeu.vizoso@collabora.com
>>      <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>      <steven.price@arm.com <mailto:steven.price@arm.com>>
>>       >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid
>>       >> memleak
>>       >>
>>       >> [AMD Public Use]
>>       >>
>>       >> Hi, Rob/Tomeu/Steven,
>>       >>
>>       >> Would you please help to review this patch for panfrost driver?
>>       >>
>>       >> Thanks,
>>       >> Jack Zhang
>>       >>
>>       >> -----Original Message-----
>>       >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>>       >> Sent: Monday, March 15, 2021 1:21 PM
>>       >> To: dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>>       >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>>      <mailto:Jack.Zhang1@amd.com>>
>>       >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>      memleak
>>       >>
>>       >> re-insert Bailing jobs to avoid memory leak.
>>       >>
>>       >> V2: move re-insert step to drm/scheduler logic
>>       >> V3: add panfrost's return value for bailing jobs in case it hits
>>      the
>>       >> memleak issue.
>>       >>
>>       >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>>      <mailto:Jack.Zhang1@amd.com>>
>>       >> ---
>>       >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>       >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>       >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>       >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>       >>   include/drm/gpu_scheduler.h                | 1 +
>>       >>   5 files changed, 19 insertions(+), 6 deletions(-)
>>       >>
>>       >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> index 79b9cc73763f..86463b0f936e 100644
>>       >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>>       >> amdgpu_device *adev,
>>       >>                       job ? job->base.id : -1);
>>       >>             /* even we skipped this reset, still need to set the
>>      job
>>       >> to guilty */
>>       >> -        if (job)
>>       >> +        if (job) {
>>       >>               drm_sched_increase_karma(&job->base);
>>       >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>>       >> +        }
>>       >>           goto skip_recovery;
>>       >>       }
>>       >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> index 759b34799221..41390bdacd9e 100644
>>       >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>>       >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       >>       struct amdgpu_task_info ti;
>>       >>       struct amdgpu_device *adev = ring->adev;
>>       >> +    int ret;
>>       >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>       >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>>       >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>       >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>       >> -        amdgpu_device_gpu_recover(ring->adev, job);
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>>       >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>>       >> +            return DRM_GPU_SCHED_STAT_BAILING;
>>       >> +        else
>>       >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >>       } else {
>>       >>           drm_sched_suspend_timeout(&ring->sched);
>>       >>           if (amdgpu_sriov_vf(adev))
>>       >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> index 6003cfeb1322..e2cb4f32dae1 100644
>>       >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>>       >> panfrost_job_timedout(struct drm_sched_job
>>       >>        * spurious. Bail out.
>>       >>        */
>>       >>       if (dma_fence_is_signaled(job->done_fence))
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>       >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>>       >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>       >>           js,
>>       >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>>       >> panfrost_job_timedout(struct drm_sched_job
>>       >>         /* Scheduler is already stopped, nothing to do. */
>>       >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>>      sched_job))
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>       >>         /* Schedule a reset if there's no reset in progress. */
>>       >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>>       >> a/drivers/gpu/drm/scheduler/sched_main.c
>>       >> b/drivers/gpu/drm/scheduler/sched_main.c
>>       >> index 92d8de24d0a1..a44f621fb5c4 100644
>>       >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>       >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>       >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>>       >> work_struct *work)  {
>>       >>       struct drm_gpu_scheduler *sched;
>>       >>       struct drm_sched_job *job;
>>       >> +    int ret;
>>       >>         sched = container_of(work, struct drm_gpu_scheduler,
>>       >> work_tdr.work);
>>       >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>>       >> work_struct *work)
>>       >>           list_del_init(&job->list);
>>       >>           spin_unlock(&sched->job_list_lock);
>>       >>   -        job->sched->ops->timedout_job(job);
>>       >> +        ret = job->sched->ops->timedout_job(job);
>>       >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>>       >> +            spin_lock(&sched->job_list_lock);
>>       >> +            list_add(&job->node, &sched->ring_mirror_list);
>>       >> +            spin_unlock(&sched->job_list_lock);
>>       >> +        }
>>
>>
>>      At this point we don't hold GPU reset locks anymore, and so we could
>>      be racing against another TDR thread from another scheduler ring of
>>      same
>>      device
>>      or another XGMI hive member. The other thread might be in the middle of
>>      luckless
>>      iteration of mirror list (drm_sched_stop, drm_sched_start and
>>      drm_sched_resubmit)
>>      and so locking job_list_lock will not help. Looks like it's required to
>>      take all GPU rest locks
>>      here.
>>
>>      Andrey
>>
>>
>>       >>           /*
>>       >>            * Guilty job did complete and hence needs to be manually
>>       >> removed
>>       >>            * See drm_sched_stop doc.
>>       >> diff --git a/include/drm/gpu_scheduler.h
>>       >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>>      100644
>>       >> --- a/include/drm/gpu_scheduler.h
>>       >> +++ b/include/drm/gpu_scheduler.h
>>       >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       >>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       >>       DRM_GPU_SCHED_STAT_ENODEV,
>>       >> +    DRM_GPU_SCHED_STAT_BAILING,
>>       >>   };
>>       >>     /**
>>       >> --
>>       >> 2.25.1
>>       >> _______________________________________________
>>       >> amd-gfx mailing list
>>       >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>       >>
>>      https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdVqL6xi0%3D&amp;reserved=0
>>
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
>> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e60
>> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d8
>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2
>> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved=
>> 0>
>>
>>       >>
>>       >
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-26  2:04     ` Zhang, Jack (Jian)
@ 2021-03-26  9:07       ` Steven Price
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-03-26  9:07 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	dri-devel, amd-gfx, Koenig, Christian, Grodzovsky, Andrey, Liu,
	Monk, Deng, Emily, Rob Herring, Tomeu Vizoso

On 26/03/2021 02:04, Zhang, Jack (Jian) wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> Hi, Steve,
> 
> Thank you for your detailed comments.
> 
> But currently the patch is not finalized.
> We found some potential race condition even with this patch. The solution is under discussion and hopefully we could find an ideal one.
> After that, I will start to consider other drm-driver if it will influence other drivers(except for amdgpu).

No problem. Please keep me CC'd, the suggestion of using reference 
counts may be beneficial for Panfrost as we already build a reference 
count on top of struct drm_sched_job. So there may be scope for cleaning 
up Panfrost afterwards even if your work doesn't directly affect it.

Thanks,

Steve

> Best,
> Jack
> 
> -----Original Message-----
> From: Steven Price <steven.price@arm.com>
> Sent: Monday, March 22, 2021 11:29 PM
> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
> 
> On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:
>> [AMD Public Use]
>>
>> Hi, Rob/Tomeu/Steven,
>>
>> Would you please help to review this patch for panfrost driver?
>>
>> Thanks,
>> Jack Zhang
>>
>> -----Original Message-----
>> From: Jack Zhang <Jack.Zhang1@amd.com>
>> Sent: Monday, March 15, 2021 1:21 PM
>> To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>> Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>> <Emily.Deng@amd.com>
>> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>
>> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> re-insert Bailing jobs to avoid memory leak.
>>
>> V2: move re-insert step to drm/scheduler logic
>> V3: add panfrost's return value for bailing jobs in case it hits the
>> memleak issue.
> 
> This commit message could do with some work - it's really hard to decipher what the actual problem you're solving is.
> 
>>
>> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>    drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>    drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>    include/drm/gpu_scheduler.h                | 1 +
>>    5 files changed, 19 insertions(+), 6 deletions(-)
>>
> [...]
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 6003cfeb1322..e2cb4f32dae1 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job
>>     * spurious. Bail out.
>>     */
>>    if (dma_fence_is_signaled(job->done_fence))
>> -return DRM_GPU_SCHED_STAT_NOMINAL;
>> +return DRM_GPU_SCHED_STAT_BAILING;
>>
>>    dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>    js,
>> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>> panfrost_job_timedout(struct drm_sched_job
>>
>>    /* Scheduler is already stopped, nothing to do. */
>>    if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job))
>> -return DRM_GPU_SCHED_STAT_NOMINAL;
>> +return DRM_GPU_SCHED_STAT_BAILING;
>>
>>    /* Schedule a reset if there's no reset in progress. */
>>    if (!atomic_xchg(&pfdev->reset.pending, 1))
> 
> This looks correct to me - in these two cases drm_sched_stop() is not called on the sched_job, so it looks like currently the job will be leaked.
> 
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 92d8de24d0a1..a44f621fb5c4 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>    {
>>    struct drm_gpu_scheduler *sched;
>>    struct drm_sched_job *job;
>> +int ret;
>>
>>    sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>
>> @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>    list_del_init(&job->list);
>>    spin_unlock(&sched->job_list_lock);
>>
>> -job->sched->ops->timedout_job(job);
>> +ret = job->sched->ops->timedout_job(job);
>>
>> +if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>> +spin_lock(&sched->job_list_lock);
>> +list_add(&job->node, &sched->ring_mirror_list);
>> +spin_unlock(&sched->job_list_lock);
>> +}
> 
> I think we could really do with a comment somewhere explaining what "bailing" means in this context. For the Panfrost case we have two cases:
> 
>    * The GPU job actually finished while the timeout code was running (done_fence is signalled).
> 
>    * The GPU is already in the process of being reset (Panfrost has multiple queues, so mostly like a bad job in another queue).
> 
> I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs back to the list. For the first case above clearly the job could just be freed (it's complete). The second case is more interesting and Panfrost currently doesn't handle this well. In theory the driver could try to rescue the job ('soft stop' in Mali language) so that it could be resubmitted. Panfrost doesn't currently support that, so attempting to resubmit the job is almost certainly going to fail.
> 
> It's on my TODO list to look at improving Panfrost in this regard, but sadly still quite far down.
> 
> Steve
> 
>>    /*
>>     * Guilty job did complete and hence needs to be manually removed
>>     * See drm_sched_stop doc.
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 4ea8606d91fe..8093ac2427ef 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>    DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>    DRM_GPU_SCHED_STAT_NOMINAL,
>>    DRM_GPU_SCHED_STAT_ENODEV,
>> +DRM_GPU_SCHED_STAT_BAILING,
>>    };
>>
>>    /**
>>
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-26  9:05                       ` Christian König
@ 2021-03-26 11:21                         ` Liu, Monk
  2021-03-26 14:51                           ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Monk @ 2021-03-26 11:21 UTC (permalink / raw)
  To: Koenig, Christian, Zhang, Jack (Jian),
	Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price
  Cc: Zhang, Andy, Jiang, Jerry (SW)

[AMD Official Use Only - Internal Distribution Only]

Hi Christian

This is not correct or correct perspective, any design comes with its pros and cons, otherwise it wouldn't comes to kernel tree in the very beginning , it is just with time passed we have more and more requirement and feature need to implement
And those new requirement drags many new solution or idea, and some idea you prefer need to based on a new infrastructure, that's all

I don't why the job "should be" or not "should be" in the scheduler, honestly speaking I can argue with you that the "scheduler" and the TDR feature which invented by AMD developer "should" never escalate to drm layer at all and by that assumption
Those vendor's compatibilities headache right now won't happen at all.

Let's just focus on the issue so far.

The solution Andrey and Jack doing right now looks good to me, and it can solve our problems without introducing regression from a surface look, but it is fine if you need a neat solution,  since we have our project pressure (which we always have)
Either we implement the first version with Jack's patch and do the revise in another series of patches (that also my initial suggestion) or we rework anything you mentioned, but since looks to me you are from time to time asking people to rework
Something in the stage that people already have a solution, which frustrated people a lot,

I would like you do prepare a solution for us, which solves our headaches ...  I really don't want to see you asked Jack to rework again and again
If you are out of bandwidth or no interest in doing this ,please at least make your solution/proposal very detail and clear, jack told me he couldn't understand your point here.

Thanks very much, and please understand our painful here

/Monk


-----邮件原件-----
发件人: Koenig, Christian <Christian.Koenig@amd.com>
发送时间: 2021年3月26日 17:06
收件人: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
主题: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

Hi guys,

Am 26.03.21 um 03:23 schrieb Zhang, Jack (Jian):
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Andrey,
>
>>> how u handle non guilty singnaled jobs in drm_sched_stop, currently
>>> looks like you don't call put for them and just explicitly free them
>>> as before
> Good point, I missed that place. Will cover that in my next patch.
>
>>> Also sched->free_guilty seems useless with the new approach.
> Yes, I agree.
>
>>> Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
> I am not quite sure about that for now, let me think about this topic today.
>
> Hi, Christian,
> should I add a fence and get/put to that fence rather than using an explicit refcount?
> And another concerns?

well let me re-iterate:

For the scheduler the job is just a temporary data structure used for scheduling the IBs to the hardware.

While pushing the job to the hardware we get a fence structure in return which represents the IBs executing on the hardware.

Unfortunately we have applied a design where the job structure is rather used for re-submitting the jobs to the hardware after a GPU reset and karma handling etc etc...

All that shouldn't have been pushed into the scheduler into the first place and we should now work on getting this cleaned up rather than making it an even bigger mess by applying halve backed solutions.

So in my opinion adding a reference count to the job is going into the completely wrong directly. What we should rather do is to fix the incorrect design decision to use jobs as vehicle in the scheduler for reset handling.

To fix this I suggest the following approach:
1. We add a pointer from the drm_sched_fence back to the drm_sched_job.
2. Instead of keeping the job around in the scheduler we keep the fence around. For this I suggest to replace the pending_list with a ring buffer.
3. The timedout_job callback is replaced with a timeout_fence callback.
4. The free_job callback is completed dropped. Job lifetime is now handled in the driver, not the scheduler.

Regards,
Christian.

>
> Thanks,
> Jack
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Friday, March 26, 2021 12:32 AM
> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org; Koenig, Christian
> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
> memleak
>
> There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>
> But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.
>
> Christian - can you chime in here ?
>
> Andrey
>
> On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> Hi, Andrey
>>
>> Thank you for your good opinions.
>>
>> I literally agree with you that the refcount could solve the
>> get_clean_up_up cocurrent job gracefully, and no need to re-insert
>> the
>>
>> job back anymore.
>>
>> I quickly made a draft for this idea as follows:
>>
>> How do you like it? I will start implement to it after I got your
>> acknowledge.
>>
>> Thanks,
>>
>> Jack
>>
>> +void drm_job_get(struct drm_sched_job *s_job)
>>
>> +{
>>
>> +       kref_get(&s_job->refcount);
>>
>> +}
>>
>> +
>>
>> +void drm_job_do_release(struct kref *ref)
>>
>> +{
>>
>> +       struct drm_sched_job *s_job;
>>
>> +       struct drm_gpu_scheduler *sched;
>>
>> +
>>
>> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>>
>> +       sched = s_job->sched;
>>
>> +       sched->ops->free_job(s_job);
>>
>> +}
>>
>> +
>>
>> +void drm_job_put(struct drm_sched_job *s_job)
>>
>> +{
>>
>> +       kref_put(&s_job->refcount, drm_job_do_release);
>>
>> +}
>>
>> +
>>
>> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>
>> {
>>
>>           struct drm_gpu_scheduler *sched = s_job->sched;
>>
>> +       kref_init(&s_job->refcount);
>>
>> +       drm_job_get(s_job);
>>
>>           spin_lock(&sched->job_list_lock);
>>
>>           list_add_tail(&s_job->node, &sched->ring_mirror_list);
>>
>>           drm_sched_start_timeout(sched);
>>
>> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
>> work_struct *work)
>>
>>                    * drm_sched_cleanup_jobs. It will be reinserted
>> back after sched->thread
>>
>>                    * is parked at which point it's safe.
>>
>>                    */
>>
>> -               list_del_init(&job->node);
>>
>> +               drm_job_get(job);
>>
>>                   spin_unlock(&sched->job_list_lock);
>>
>>                   job->sched->ops->timedout_job(job);
>>
>> -
>>
>> +               drm_job_put(job);
>>
>>                   /*
>>
>>                    * Guilty job did complete and hence needs to be
>> manually removed
>>
>>                    * See drm_sched_stop doc.
>>
>>                    */
>>
>>                   if (sched->free_guilty) {
>>
>> -                       job->sched->ops->free_job(job);
>>
>>                           sched->free_guilty = false;
>>
>>                   }
>>
>>           } else {
>>
>> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
>> *sched, struct drm_sched_job *bad)
>>
>> -       /*
>>
>> -        * Reinsert back the bad job here - now it's safe as
>>
>> -        * drm_sched_get_cleanup_job cannot race against us and
>> release the
>>
>> -        * bad job at this point - we parked (waited for) any in
>> progress
>>
>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
>> be called
>>
>> -        * now until the scheduler thread is unparked.
>>
>> -        */
>>
>> -       if (bad && bad->sched == sched)
>>
>> -               /*
>>
>> -                * Add at the head of the queue to reflect it was the
>> earliest
>>
>> -                * job extracted.
>>
>> -                */
>>
>> -               list_add(&bad->node, &sched->ring_mirror_list);
>>
>> -
>>
>>           /*
>>
>>            * Iterate the job list from later to  earlier one and
>> either deactive
>>
>>            * their HW callbacks or remove them from mirror list if
>> they already
>>
>> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>>
>>                                            kthread_should_stop());
>>
>>                   if (cleanup_job) {
>>
>> -                       sched->ops->free_job(cleanup_job);
>>
>> +                       drm_job_put(cleanup_job);
>>
>>                           /* queue timeout for next job */
>>
>>                           drm_sched_start_timeout(sched);
>>
>>                   }
>>
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h
>>
>> index 5a1f068af1c2..b80513eec90f 100644
>>
>> --- a/include/drm/gpu_scheduler.h
>>
>> +++ b/include/drm/gpu_scheduler.h
>>
>> @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
>> dma_fence *f);
>>
>>     * to schedule the job.
>>
>>     */
>>
>> struct drm_sched_job {
>>
>> +       struct kref                     refcount;
>>
>>           struct spsc_node                queue_node;
>>
>>           struct drm_gpu_scheduler        *sched;
>>
>>           struct drm_sched_fence          *s_fence;
>>
>> @@ -198,6 +199,7 @@ struct drm_sched_job {
>>
>>           enum drm_sched_priority         s_priority;
>>
>>           struct drm_sched_entity  *entity;
>>
>>           struct dma_fence_cb             cb;
>>
>> +
>>
>> };
>>
>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> *Sent:* Friday, March 19, 2021 12:17 AM
>> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>> avoid memleak
>>
>> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      Hi, Andrey
>>
>>      Let me summarize the background of this patch:
>>
>>      In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>>
>>      It will submit first jobs of each ring and do guilty job re-check.
>>
>>      At that point, We had to make sure each job is in the mirror list(or
>>      re-inserted back already).
>>
>>      But we found the current code never re-insert the job to mirror list
>>      in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>>
>>      This not only will cause memleak of the bailing jobs. What’s more
>>      important, the 1^st tdr thread can never iterate the bailing job and
>>      set its guilty status to a correct status.
>>
>>      Therefore, we had to re-insert the job(or even not delete node) for
>>      bailing job.
>>
>>      For the above V3 patch, the racing condition in my mind is:
>>
>>      we cannot make sure all bailing jobs are finished before we do
>>      amdgpu_device_recheck_guilty_jobs.
>>
>> Yes,that race i missed - so you say that for 2nd, baling thread who
>> extracted the job, even if he reinsert it right away back after
>> driver callback return DRM_GPU_SCHED_STAT_BAILING, there is small
>> time slot where the job is not in mirror list and so the 1st TDR
>> might miss it and not find that  2nd job is the actual guilty job,
>> right ? But, still this job will get back into mirror list, and since
>> it's really the bad job, it will never signal completion and so on
>> the next timeout cycle it will be caught (of course there is a
>> starvation scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>>
>>      Based on this insight, I think we have two options to solve this issue:
>>
>>       1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>>          atomic variable)
>>       2. Re-insert back bailing job, and meanwhile use semaphore in each
>>          tdr thread to keep the sequence as expected and ensure each job
>>          is in the mirror list when do resubmit step.
>>
>>      For Option1, logic is simpler and we need only one global atomic
>>      variable:
>>
>>      What do you think about this plan?
>>
>>      Option1 should look like the following logic:
>>
>>      +static atomic_t in_reset;             //a global atomic var for
>>      synchronization
>>
>>      static void drm_sched_process_job(struct dma_fence *f, struct
>>      dma_fence_cb *cb);
>>
>>        /**
>>
>>      @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>>      work_struct *work)
>>
>>                        * drm_sched_cleanup_jobs. It will be reinserted
>>      back after sched->thread
>>
>>                        * is parked at which point it's safe.
>>
>>                        */
>>
>>      +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>>      delete node if it’s thead1,2,3,….
>>
>>      +                       spin_unlock(&sched->job_list_lock);
>>
>>      +                       drm_sched_start_timeout(sched);
>>
>>      +                       return;
>>
>>      +               }
>>
>>      +
>>
>>                       list_del_init(&job->node);
>>
>>                       spin_unlock(&sched->job_list_lock);
>>
>>      @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>>      work_struct *work)
>>
>>               spin_lock(&sched->job_list_lock);
>>
>>               drm_sched_start_timeout(sched);
>>
>>               spin_unlock(&sched->job_list_lock);
>>
>>      +       atomic_set(&in_reset, 0); //reset in_reset when the first
>>      thread finished tdr
>>
>>      }
>>
>> Technically looks like it should work as you don't access the job
>> pointer any longer and so no risk that if signaled it will be freed
>> by drm_sched_get_cleanup_job but,you can't just use one global
>> variable an by this bailing from TDR when different drivers run their
>> TDR threads in parallel, and even for amdgpu, if devices in different
>> XGMI hives or 2 independent devices in non XGMI setup. There should
>> be defined some kind of GPU reset group structure on drm_scheduler
>> level for which this variable would be used.
>>
>> P.S I wonder why we can't just ref-count the job so that even if
>> drm_sched_get_cleanup_job would delete it before we had a chance to
>> stop the scheduler thread, we wouldn't crash. This would avoid all
>> the dance with deletion and reinsertion.
>>
>> Andrey
>>
>>      Thanks,
>>
>>      Jack
>>
>>      *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>>      <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>>      Jack (Jian)
>>      *Sent:* Wednesday, March 17, 2021 11:11 PM
>>      *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>>      <mailto:ckoenig.leichtzumerken@gmail.com>;
>>      dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org
>>      <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>>      <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>>      Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>>      <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>>      <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>>      <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>>      Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>>      Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>      <mailto:Andrey.Grodzovsky@amd.com>
>>      *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid memleak
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      [AMD Official Use Only - Internal Distribution Only]
>>
>>      Hi,Andrey,
>>
>>      Good catch,I will expore this corner case and give feedback
>> soon~
>>
>>      Best,
>>
>>      Jack
>>
>>
>> ---------------------------------------------------------------------
>> -
>> --
>>
>>      *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>>      <mailto:Andrey.Grodzovsky@amd.com>>
>>      *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>>      *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>>      <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>>      <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>      dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>
>>      <dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>      <amd-gfx@lists.freedesktop.org
>>      <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>>      <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>>      Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>>      <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>>      <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>>      <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>>      Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>>      *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid memleak
>>
>>      I actually have a race condition concern here - see bellow -
>>
>>      On 2021-03-17 3:43 a.m., Christian König wrote:
>>       > I was hoping Andrey would take a look since I'm really busy with
>>      other
>>       > work right now.
>>       >
>>       > Regards,
>>       > Christian.
>>       >
>>       > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>>       >> Hi, Andrey/Crhistian and Team,
>>       >>
>>       >> I didn't receive the reviewer's message from maintainers on
>>      panfrost
>>       >> driver for several days.
>>       >> Due to this patch is urgent for my current working project.
>>       >> Would you please help to give some review ideas?
>>       >>
>>       >> Many Thanks,
>>       >> Jack
>>       >> -----Original Message-----
>>       >> From: Zhang, Jack (Jian)
>>       >> Sent: Tuesday, March 16, 2021 3:20 PM
>>       >> To: dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>      Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>       >> Vizoso <tomeu.vizoso@collabora.com
>>      <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>      <steven.price@arm.com <mailto:steven.price@arm.com>>
>>       >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid
>>       >> memleak
>>       >>
>>       >> [AMD Public Use]
>>       >>
>>       >> Ping
>>       >>
>>       >> -----Original Message-----
>>       >> From: Zhang, Jack (Jian)
>>       >> Sent: Monday, March 15, 2021 1:24 PM
>>       >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>       >> dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>      Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>       >> Vizoso <tomeu.vizoso@collabora.com
>>      <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>      <steven.price@arm.com <mailto:steven.price@arm.com>>
>>       >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>      avoid
>>       >> memleak
>>       >>
>>       >> [AMD Public Use]
>>       >>
>>       >> Hi, Rob/Tomeu/Steven,
>>       >>
>>       >> Would you please help to review this patch for panfrost driver?
>>       >>
>>       >> Thanks,
>>       >> Jack Zhang
>>       >>
>>       >> -----Original Message-----
>>       >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>>       >> Sent: Monday, March 15, 2021 1:21 PM
>>       >> To: dri-devel@lists.freedesktop.org
>>      <mailto:dri-devel@lists.freedesktop.org>;
>>      amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>       >> Koenig, Christian <Christian.Koenig@amd.com
>>      <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>       >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>      Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>       >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>>       >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>>      <mailto:Jack.Zhang1@amd.com>>
>>       >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>      memleak
>>       >>
>>       >> re-insert Bailing jobs to avoid memory leak.
>>       >>
>>       >> V2: move re-insert step to drm/scheduler logic
>>       >> V3: add panfrost's return value for bailing jobs in case it hits
>>      the
>>       >> memleak issue.
>>       >>
>>       >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>>      <mailto:Jack.Zhang1@amd.com>>
>>       >> ---
>>       >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>       >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>       >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>       >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>       >>   include/drm/gpu_scheduler.h                | 1 +
>>       >>   5 files changed, 19 insertions(+), 6 deletions(-)
>>       >>
>>       >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> index 79b9cc73763f..86463b0f936e 100644
>>       >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>       >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>>       >> amdgpu_device *adev,
>>       >>                       job ? job->base.id : -1);
>>       >>             /* even we skipped this reset, still need to set the
>>      job
>>       >> to guilty */
>>       >> -        if (job)
>>       >> +        if (job) {
>>       >>               drm_sched_increase_karma(&job->base);
>>       >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>>       >> +        }
>>       >>           goto skip_recovery;
>>       >>       }
>>       >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> index 759b34799221..41390bdacd9e 100644
>>       >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>       >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>>       >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>       >>       struct amdgpu_task_info ti;
>>       >>       struct amdgpu_device *adev = ring->adev;
>>       >> +    int ret;
>>       >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>       >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>>       >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>       >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>       >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>       >> -        amdgpu_device_gpu_recover(ring->adev, job);
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>>       >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>>       >> +            return DRM_GPU_SCHED_STAT_BAILING;
>>       >> +        else
>>       >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >>       } else {
>>       >>           drm_sched_suspend_timeout(&ring->sched);
>>       >>           if (amdgpu_sriov_vf(adev))
>>       >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> index 6003cfeb1322..e2cb4f32dae1 100644
>>       >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>       >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>>       >> panfrost_job_timedout(struct drm_sched_job
>>       >>        * spurious. Bail out.
>>       >>        */
>>       >>       if (dma_fence_is_signaled(job->done_fence))
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>       >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>>       >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>       >>           js,
>>       >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>>       >> panfrost_job_timedout(struct drm_sched_job
>>       >>         /* Scheduler is already stopped, nothing to do. */
>>       >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>>      sched_job))
>>       >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>       >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>       >>         /* Schedule a reset if there's no reset in progress. */
>>       >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>>       >> a/drivers/gpu/drm/scheduler/sched_main.c
>>       >> b/drivers/gpu/drm/scheduler/sched_main.c
>>       >> index 92d8de24d0a1..a44f621fb5c4 100644
>>       >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>       >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>       >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>>       >> work_struct *work)  {
>>       >>       struct drm_gpu_scheduler *sched;
>>       >>       struct drm_sched_job *job;
>>       >> +    int ret;
>>       >>         sched = container_of(work, struct drm_gpu_scheduler,
>>       >> work_tdr.work);
>>       >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>>       >> work_struct *work)
>>       >>           list_del_init(&job->list);
>>       >>           spin_unlock(&sched->job_list_lock);
>>       >>   -        job->sched->ops->timedout_job(job);
>>       >> +        ret = job->sched->ops->timedout_job(job);
>>       >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>>       >> +            spin_lock(&sched->job_list_lock);
>>       >> +            list_add(&job->node, &sched->ring_mirror_list);
>>       >> +            spin_unlock(&sched->job_list_lock);
>>       >> +        }
>>
>>
>>      At this point we don't hold GPU reset locks anymore, and so we could
>>      be racing against another TDR thread from another scheduler ring of
>>      same
>>      device
>>      or another XGMI hive member. The other thread might be in the middle of
>>      luckless
>>      iteration of mirror list (drm_sched_stop, drm_sched_start and
>>      drm_sched_resubmit)
>>      and so locking job_list_lock will not help. Looks like it's required to
>>      take all GPU rest locks
>>      here.
>>
>>      Andrey
>>
>>
>>       >>           /*
>>       >>            * Guilty job did complete and hence needs to be manually
>>       >> removed
>>       >>            * See drm_sched_stop doc.
>>       >> diff --git a/include/drm/gpu_scheduler.h
>>       >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>>      100644
>>       >> --- a/include/drm/gpu_scheduler.h
>>       >> +++ b/include/drm/gpu_scheduler.h
>>       >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>       >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>       >>       DRM_GPU_SCHED_STAT_NOMINAL,
>>       >>       DRM_GPU_SCHED_STAT_ENODEV,
>>       >> +    DRM_GPU_SCHED_STAT_BAILING,
>>       >>   };
>>       >>     /**
>>       >> --
>>       >> 2.25.1
>>       >> _______________________________________________
>>       >> amd-gfx mailing list
>>       >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>       >>
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7C
>> Andrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8
>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7
>> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
>> VCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdV
>> qL6xi0%3D&amp;reserved=0
>>
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>> s
>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
>> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e6
>> 0
>> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d
>> 8
>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
>> 2
>> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved
>> =
>> 0>
>>
>>       >>
>>       >
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-26 11:21                         ` 回复: " Liu, Monk
@ 2021-03-26 14:51                           ` Christian König
  2021-03-30  3:10                             ` Liu, Monk
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2021-03-26 14:51 UTC (permalink / raw)
  To: Liu, Monk, Zhang, Jack (Jian),
	Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price
  Cc: Zhang, Andy, Jiang, Jerry (SW)

Hi Monk,

I can't disagree more.

The fundamental problem here is that we have pushed a design without 
validating if it really fits into the concepts the Linux kernel mandates 
here.

My mistake was that I haven't pushed back hard enough on the initial 
design resulting in numerous cycles of trying to save the design while 
band aiding the flaws which became obvious after a while.

I haven't counted them but I think we are now already had over 10 
patches which try to work around lifetime issues of the job object 
because I wasn't able to properly explain why this isn't going to work 
like this.

Because of this I will hard reject any attempt to band aid this issue 
even more which isn't starting over again with a design which looks like 
it is going to work.

Regards,
Christian.

Am 26.03.21 um 12:21 schrieb Liu, Monk:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian
>
> This is not correct or correct perspective, any design comes with its pros and cons, otherwise it wouldn't comes to kernel tree in the very beginning , it is just with time passed we have more and more requirement and feature need to implement
> And those new requirement drags many new solution or idea, and some idea you prefer need to based on a new infrastructure, that's all
>
> I don't why the job "should be" or not "should be" in the scheduler, honestly speaking I can argue with you that the "scheduler" and the TDR feature which invented by AMD developer "should" never escalate to drm layer at all and by that assumption
> Those vendor's compatibilities headache right now won't happen at all.
>
> Let's just focus on the issue so far.
>
> The solution Andrey and Jack doing right now looks good to me, and it can solve our problems without introducing regression from a surface look, but it is fine if you need a neat solution,  since we have our project pressure (which we always have)
> Either we implement the first version with Jack's patch and do the revise in another series of patches (that also my initial suggestion) or we rework anything you mentioned, but since looks to me you are from time to time asking people to rework
> Something in the stage that people already have a solution, which frustrated people a lot,
>
> I would like you do prepare a solution for us, which solves our headaches ...  I really don't want to see you asked Jack to rework again and again
> If you are out of bandwidth or no interest in doing this ,please at least make your solution/proposal very detail and clear, jack told me he couldn't understand your point here.
>
> Thanks very much, and please understand our painful here
>
> /Monk
>
>
> -----邮件原件-----
> 发件人: Koenig, Christian <Christian.Koenig@amd.com>
> 发送时间: 2021年3月26日 17:06
> 收件人: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> 主题: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> Hi guys,
>
> Am 26.03.21 um 03:23 schrieb Zhang, Jack (Jian):
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi, Andrey,
>>
>>>> how u handle non guilty singnaled jobs in drm_sched_stop, currently
>>>> looks like you don't call put for them and just explicitly free them
>>>> as before
>> Good point, I missed that place. Will cover that in my next patch.
>>
>>>> Also sched->free_guilty seems useless with the new approach.
>> Yes, I agree.
>>
>>>> Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>> I am not quite sure about that for now, let me think about this topic today.
>>
>> Hi, Christian,
>> should I add a fence and get/put to that fence rather than using an explicit refcount?
>> And another concerns?
> well let me re-iterate:
>
> For the scheduler the job is just a temporary data structure used for scheduling the IBs to the hardware.
>
> While pushing the job to the hardware we get a fence structure in return which represents the IBs executing on the hardware.
>
> Unfortunately we have applied a design where the job structure is rather used for re-submitting the jobs to the hardware after a GPU reset and karma handling etc etc...
>
> All that shouldn't have been pushed into the scheduler into the first place and we should now work on getting this cleaned up rather than making it an even bigger mess by applying halve backed solutions.
>
> So in my opinion adding a reference count to the job is going into the completely wrong directly. What we should rather do is to fix the incorrect design decision to use jobs as vehicle in the scheduler for reset handling.
>
> To fix this I suggest the following approach:
> 1. We add a pointer from the drm_sched_fence back to the drm_sched_job.
> 2. Instead of keeping the job around in the scheduler we keep the fence around. For this I suggest to replace the pending_list with a ring buffer.
> 3. The timedout_job callback is replaced with a timeout_fence callback.
> 4. The free_job callback is completed dropped. Job lifetime is now handled in the driver, not the scheduler.
>
> Regards,
> Christian.
>
>> Thanks,
>> Jack
>>
>> -----Original Message-----
>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Sent: Friday, March 26, 2021 12:32 AM
>> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>>
>> But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.
>>
>> Christian - can you chime in here ?
>>
>> Andrey
>>
>> On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>>
>>> Hi, Andrey
>>>
>>> Thank you for your good opinions.
>>>
>>> I literally agree with you that the refcount could solve the
>>> get_clean_up_up cocurrent job gracefully, and no need to re-insert
>>> the
>>>
>>> job back anymore.
>>>
>>> I quickly made a draft for this idea as follows:
>>>
>>> How do you like it? I will start implement to it after I got your
>>> acknowledge.
>>>
>>> Thanks,
>>>
>>> Jack
>>>
>>> +void drm_job_get(struct drm_sched_job *s_job)
>>>
>>> +{
>>>
>>> +       kref_get(&s_job->refcount);
>>>
>>> +}
>>>
>>> +
>>>
>>> +void drm_job_do_release(struct kref *ref)
>>>
>>> +{
>>>
>>> +       struct drm_sched_job *s_job;
>>>
>>> +       struct drm_gpu_scheduler *sched;
>>>
>>> +
>>>
>>> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>>>
>>> +       sched = s_job->sched;
>>>
>>> +       sched->ops->free_job(s_job);
>>>
>>> +}
>>>
>>> +
>>>
>>> +void drm_job_put(struct drm_sched_job *s_job)
>>>
>>> +{
>>>
>>> +       kref_put(&s_job->refcount, drm_job_do_release);
>>>
>>> +}
>>>
>>> +
>>>
>>> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>
>>> {
>>>
>>>            struct drm_gpu_scheduler *sched = s_job->sched;
>>>
>>> +       kref_init(&s_job->refcount);
>>>
>>> +       drm_job_get(s_job);
>>>
>>>            spin_lock(&sched->job_list_lock);
>>>
>>>            list_add_tail(&s_job->node, &sched->ring_mirror_list);
>>>
>>>            drm_sched_start_timeout(sched);
>>>
>>> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
>>> work_struct *work)
>>>
>>>                     * drm_sched_cleanup_jobs. It will be reinserted
>>> back after sched->thread
>>>
>>>                     * is parked at which point it's safe.
>>>
>>>                     */
>>>
>>> -               list_del_init(&job->node);
>>>
>>> +               drm_job_get(job);
>>>
>>>                    spin_unlock(&sched->job_list_lock);
>>>
>>>                    job->sched->ops->timedout_job(job);
>>>
>>> -
>>>
>>> +               drm_job_put(job);
>>>
>>>                    /*
>>>
>>>                     * Guilty job did complete and hence needs to be
>>> manually removed
>>>
>>>                     * See drm_sched_stop doc.
>>>
>>>                     */
>>>
>>>                    if (sched->free_guilty) {
>>>
>>> -                       job->sched->ops->free_job(job);
>>>
>>>                            sched->free_guilty = false;
>>>
>>>                    }
>>>
>>>            } else {
>>>
>>> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>> *sched, struct drm_sched_job *bad)
>>>
>>> -       /*
>>>
>>> -        * Reinsert back the bad job here - now it's safe as
>>>
>>> -        * drm_sched_get_cleanup_job cannot race against us and
>>> release the
>>>
>>> -        * bad job at this point - we parked (waited for) any in
>>> progress
>>>
>>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
>>> be called
>>>
>>> -        * now until the scheduler thread is unparked.
>>>
>>> -        */
>>>
>>> -       if (bad && bad->sched == sched)
>>>
>>> -               /*
>>>
>>> -                * Add at the head of the queue to reflect it was the
>>> earliest
>>>
>>> -                * job extracted.
>>>
>>> -                */
>>>
>>> -               list_add(&bad->node, &sched->ring_mirror_list);
>>>
>>> -
>>>
>>>            /*
>>>
>>>             * Iterate the job list from later to  earlier one and
>>> either deactive
>>>
>>>             * their HW callbacks or remove them from mirror list if
>>> they already
>>>
>>> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>>>
>>>                                             kthread_should_stop());
>>>
>>>                    if (cleanup_job) {
>>>
>>> -                       sched->ops->free_job(cleanup_job);
>>>
>>> +                       drm_job_put(cleanup_job);
>>>
>>>                            /* queue timeout for next job */
>>>
>>>                            drm_sched_start_timeout(sched);
>>>
>>>                    }
>>>
>>> diff --git a/include/drm/gpu_scheduler.h
>>> b/include/drm/gpu_scheduler.h
>>>
>>> index 5a1f068af1c2..b80513eec90f 100644
>>>
>>> --- a/include/drm/gpu_scheduler.h
>>>
>>> +++ b/include/drm/gpu_scheduler.h
>>>
>>> @@ -188,6 +188,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
>>> dma_fence *f);
>>>
>>>      * to schedule the job.
>>>
>>>      */
>>>
>>> struct drm_sched_job {
>>>
>>> +       struct kref                     refcount;
>>>
>>>            struct spsc_node                queue_node;
>>>
>>>            struct drm_gpu_scheduler        *sched;
>>>
>>>            struct drm_sched_fence          *s_fence;
>>>
>>> @@ -198,6 +199,7 @@ struct drm_sched_job {
>>>
>>>            enum drm_sched_priority         s_priority;
>>>
>>>            struct drm_sched_entity  *entity;
>>>
>>>            struct dma_fence_cb             cb;
>>>
>>> +
>>>
>>> };
>>>
>>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> *Sent:* Friday, March 19, 2021 12:17 AM
>>> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
>>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>>> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>> avoid memleak
>>>
>>> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       Hi, Andrey
>>>
>>>       Let me summarize the background of this patch:
>>>
>>>       In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>>>
>>>       It will submit first jobs of each ring and do guilty job re-check.
>>>
>>>       At that point, We had to make sure each job is in the mirror list(or
>>>       re-inserted back already).
>>>
>>>       But we found the current code never re-insert the job to mirror list
>>>       in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>>>
>>>       This not only will cause memleak of the bailing jobs. What’s more
>>>       important, the 1^st tdr thread can never iterate the bailing job and
>>>       set its guilty status to a correct status.
>>>
>>>       Therefore, we had to re-insert the job(or even not delete node) for
>>>       bailing job.
>>>
>>>       For the above V3 patch, the racing condition in my mind is:
>>>
>>>       we cannot make sure all bailing jobs are finished before we do
>>>       amdgpu_device_recheck_guilty_jobs.
>>>
>>> Yes,that race i missed - so you say that for 2nd, baling thread who
>>> extracted the job, even if he reinsert it right away back after
>>> driver callback return DRM_GPU_SCHED_STAT_BAILING, there is small
>>> time slot where the job is not in mirror list and so the 1st TDR
>>> might miss it and not find that  2nd job is the actual guilty job,
>>> right ? But, still this job will get back into mirror list, and since
>>> it's really the bad job, it will never signal completion and so on
>>> the next timeout cycle it will be caught (of course there is a
>>> starvation scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>>>
>>>       Based on this insight, I think we have two options to solve this issue:
>>>
>>>        1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>>>           atomic variable)
>>>        2. Re-insert back bailing job, and meanwhile use semaphore in each
>>>           tdr thread to keep the sequence as expected and ensure each job
>>>           is in the mirror list when do resubmit step.
>>>
>>>       For Option1, logic is simpler and we need only one global atomic
>>>       variable:
>>>
>>>       What do you think about this plan?
>>>
>>>       Option1 should look like the following logic:
>>>
>>>       +static atomic_t in_reset;             //a global atomic var for
>>>       synchronization
>>>
>>>       static void drm_sched_process_job(struct dma_fence *f, struct
>>>       dma_fence_cb *cb);
>>>
>>>         /**
>>>
>>>       @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>>>       work_struct *work)
>>>
>>>                         * drm_sched_cleanup_jobs. It will be reinserted
>>>       back after sched->thread
>>>
>>>                         * is parked at which point it's safe.
>>>
>>>                         */
>>>
>>>       +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>>>       delete node if it’s thead1,2,3,….
>>>
>>>       +                       spin_unlock(&sched->job_list_lock);
>>>
>>>       +                       drm_sched_start_timeout(sched);
>>>
>>>       +                       return;
>>>
>>>       +               }
>>>
>>>       +
>>>
>>>                        list_del_init(&job->node);
>>>
>>>                        spin_unlock(&sched->job_list_lock);
>>>
>>>       @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>>>       work_struct *work)
>>>
>>>                spin_lock(&sched->job_list_lock);
>>>
>>>                drm_sched_start_timeout(sched);
>>>
>>>                spin_unlock(&sched->job_list_lock);
>>>
>>>       +       atomic_set(&in_reset, 0); //reset in_reset when the first
>>>       thread finished tdr
>>>
>>>       }
>>>
>>> Technically looks like it should work as you don't access the job
>>> pointer any longer and so no risk that if signaled it will be freed
>>> by drm_sched_get_cleanup_job but,you can't just use one global
>>> variable an by this bailing from TDR when different drivers run their
>>> TDR threads in parallel, and even for amdgpu, if devices in different
>>> XGMI hives or 2 independent devices in non XGMI setup. There should
>>> be defined some kind of GPU reset group structure on drm_scheduler
>>> level for which this variable would be used.
>>>
>>> P.S I wonder why we can't just ref-count the job so that even if
>>> drm_sched_get_cleanup_job would delete it before we had a chance to
>>> stop the scheduler thread, we wouldn't crash. This would avoid all
>>> the dance with deletion and reinsertion.
>>>
>>> Andrey
>>>
>>>       Thanks,
>>>
>>>       Jack
>>>
>>>       *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>>>       <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>>>       Jack (Jian)
>>>       *Sent:* Wednesday, March 17, 2021 11:11 PM
>>>       *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>>>       <mailto:ckoenig.leichtzumerken@gmail.com>;
>>>       dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org
>>>       <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>>>       <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>>>       Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>>>       <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>>>       <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>>>       <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>>>       Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>>>       Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>>       <mailto:Andrey.Grodzovsky@amd.com>
>>>       *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid memleak
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       Hi,Andrey,
>>>
>>>       Good catch,I will expore this corner case and give feedback
>>> soon~
>>>
>>>       Best,
>>>
>>>       Jack
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>>
>>>       *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>>>       <mailto:Andrey.Grodzovsky@amd.com>>
>>>       *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>>>       *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>>>       <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>>>       <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>       dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>
>>>       <dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>       <amd-gfx@lists.freedesktop.org
>>>       <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>>>       <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>>>       Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>>>       <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>>>       <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>>>       <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>>>       Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>       *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid memleak
>>>
>>>       I actually have a race condition concern here - see bellow -
>>>
>>>       On 2021-03-17 3:43 a.m., Christian König wrote:
>>>        > I was hoping Andrey would take a look since I'm really busy with
>>>       other
>>>        > work right now.
>>>        >
>>>        > Regards,
>>>        > Christian.
>>>        >
>>>        > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>>>        >> Hi, Andrey/Crhistian and Team,
>>>        >>
>>>        >> I didn't receive the reviewer's message from maintainers on
>>>       panfrost
>>>        >> driver for several days.
>>>        >> Due to this patch is urgent for my current working project.
>>>        >> Would you please help to give some review ideas?
>>>        >>
>>>        >> Many Thanks,
>>>        >> Jack
>>>        >> -----Original Message-----
>>>        >> From: Zhang, Jack (Jian)
>>>        >> Sent: Tuesday, March 16, 2021 3:20 PM
>>>        >> To: dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>       Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>        >> Vizoso <tomeu.vizoso@collabora.com
>>>       <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>       <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>        >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid
>>>        >> memleak
>>>        >>
>>>        >> [AMD Public Use]
>>>        >>
>>>        >> Ping
>>>        >>
>>>        >> -----Original Message-----
>>>        >> From: Zhang, Jack (Jian)
>>>        >> Sent: Monday, March 15, 2021 1:24 PM
>>>        >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>        >> dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>       Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>        >> Vizoso <tomeu.vizoso@collabora.com
>>>       <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>       <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>        >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid
>>>        >> memleak
>>>        >>
>>>        >> [AMD Public Use]
>>>        >>
>>>        >> Hi, Rob/Tomeu/Steven,
>>>        >>
>>>        >> Would you please help to review this patch for panfrost driver?
>>>        >>
>>>        >> Thanks,
>>>        >> Jack Zhang
>>>        >>
>>>        >> -----Original Message-----
>>>        >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>>>        >> Sent: Monday, March 15, 2021 1:21 PM
>>>        >> To: dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>>>        >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>>>       <mailto:Jack.Zhang1@amd.com>>
>>>        >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>>       memleak
>>>        >>
>>>        >> re-insert Bailing jobs to avoid memory leak.
>>>        >>
>>>        >> V2: move re-insert step to drm/scheduler logic
>>>        >> V3: add panfrost's return value for bailing jobs in case it hits
>>>       the
>>>        >> memleak issue.
>>>        >>
>>>        >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>>>       <mailto:Jack.Zhang1@amd.com>>
>>>        >> ---
>>>        >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>>        >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>>        >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>>        >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>>        >>   include/drm/gpu_scheduler.h                | 1 +
>>>        >>   5 files changed, 19 insertions(+), 6 deletions(-)
>>>        >>
>>>        >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> index 79b9cc73763f..86463b0f936e 100644
>>>        >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>>>        >> amdgpu_device *adev,
>>>        >>                       job ? job->base.id : -1);
>>>        >>             /* even we skipped this reset, still need to set the
>>>       job
>>>        >> to guilty */
>>>        >> -        if (job)
>>>        >> +        if (job) {
>>>        >>               drm_sched_increase_karma(&job->base);
>>>        >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>>>        >> +        }
>>>        >>           goto skip_recovery;
>>>        >>       }
>>>        >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> index 759b34799221..41390bdacd9e 100644
>>>        >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>>>        >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>        >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>>        >>       struct amdgpu_task_info ti;
>>>        >>       struct amdgpu_device *adev = ring->adev;
>>>        >> +    int ret;
>>>        >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>        >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>>>        >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>        >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>        >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>>        >> -        amdgpu_device_gpu_recover(ring->adev, job);
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>>>        >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>>>        >> +            return DRM_GPU_SCHED_STAT_BAILING;
>>>        >> +        else
>>>        >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >>       } else {
>>>        >>           drm_sched_suspend_timeout(&ring->sched);
>>>        >>           if (amdgpu_sriov_vf(adev))
>>>        >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> index 6003cfeb1322..e2cb4f32dae1 100644
>>>        >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>>>        >> panfrost_job_timedout(struct drm_sched_job
>>>        >>        * spurious. Bail out.
>>>        >>        */
>>>        >>       if (dma_fence_is_signaled(job->done_fence))
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>        >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>>>        >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>>        >>           js,
>>>        >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>>>        >> panfrost_job_timedout(struct drm_sched_job
>>>        >>         /* Scheduler is already stopped, nothing to do. */
>>>        >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>>>       sched_job))
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>        >>         /* Schedule a reset if there's no reset in progress. */
>>>        >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>>>        >> a/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> b/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> index 92d8de24d0a1..a44f621fb5c4 100644
>>>        >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>>>        >> work_struct *work)  {
>>>        >>       struct drm_gpu_scheduler *sched;
>>>        >>       struct drm_sched_job *job;
>>>        >> +    int ret;
>>>        >>         sched = container_of(work, struct drm_gpu_scheduler,
>>>        >> work_tdr.work);
>>>        >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>>>        >> work_struct *work)
>>>        >>           list_del_init(&job->list);
>>>        >>           spin_unlock(&sched->job_list_lock);
>>>        >>   -        job->sched->ops->timedout_job(job);
>>>        >> +        ret = job->sched->ops->timedout_job(job);
>>>        >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>>>        >> +            spin_lock(&sched->job_list_lock);
>>>        >> +            list_add(&job->node, &sched->ring_mirror_list);
>>>        >> +            spin_unlock(&sched->job_list_lock);
>>>        >> +        }
>>>
>>>
>>>       At this point we don't hold GPU reset locks anymore, and so we could
>>>       be racing against another TDR thread from another scheduler ring of
>>>       same
>>>       device
>>>       or another XGMI hive member. The other thread might be in the middle of
>>>       luckless
>>>       iteration of mirror list (drm_sched_stop, drm_sched_start and
>>>       drm_sched_resubmit)
>>>       and so locking job_list_lock will not help. Looks like it's required to
>>>       take all GPU rest locks
>>>       here.
>>>
>>>       Andrey
>>>
>>>
>>>        >>           /*
>>>        >>            * Guilty job did complete and hence needs to be manually
>>>        >> removed
>>>        >>            * See drm_sched_stop doc.
>>>        >> diff --git a/include/drm/gpu_scheduler.h
>>>        >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>>>       100644
>>>        >> --- a/include/drm/gpu_scheduler.h
>>>        >> +++ b/include/drm/gpu_scheduler.h
>>>        >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>>        >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>>        >>       DRM_GPU_SCHED_STAT_NOMINAL,
>>>        >>       DRM_GPU_SCHED_STAT_ENODEV,
>>>        >> +    DRM_GPU_SCHED_STAT_BAILING,
>>>        >>   };
>>>        >>     /**
>>>        >> --
>>>        >> 2.25.1
>>>        >> _______________________________________________
>>>        >> amd-gfx mailing list
>>>        >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>        >>
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7C
>>> Andrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd8
>>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%7
>>> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
>>> VCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSdV
>>> qL6xi0%3D&amp;reserved=0
>>>
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>> s
>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
>>> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e6
>>> 0
>>> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3d
>>> 8
>>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
>>> 2
>>> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserved
>>> =
>>> 0>
>>>
>>>        >>
>>>        >
>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-26 14:51                           ` Christian König
@ 2021-03-30  3:10                             ` Liu, Monk
  2021-03-30  6:59                               ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Liu, Monk @ 2021-03-30  3:10 UTC (permalink / raw)
  To: Koenig, Christian, Zhang, Jack (Jian),
	Grodzovsky, Andrey, Christian König, dri-devel, amd-gfx,
	Deng, Emily, Rob Herring, Tomeu Vizoso, Steven Price
  Cc: Zhang, Andy, Jiang, Jerry (SW)

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,

We don't need to debate on the design's topic, each of us have our own opinion, it is hard to persuade others sometimes, again with more and more features and requirements it is pretty normal that an old design need to
Refine and or even rework to satisfy all those needs, so I'm not trying to argue with you that we don't need a better rework, that's also pleasure me .

In the moment, the more important thing I care is the solution because SRIOV project still try best to put all changes into upstreaming tree, we don't want to fork another tree unless no choice ... 

Let's have a sync in another thread 

Thanks for you help on this

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Friday, March 26, 2021 10:51 PM
To: Liu, Monk <Monk.Liu@amd.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
Cc: Zhang, Andy <Andy.Zhang@amd.com>; Jiang, Jerry (SW) <Jerry.Jiang@amd.com>
Subject: Re: 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

Hi Monk,

I can't disagree more.

The fundamental problem here is that we have pushed a design without validating if it really fits into the concepts the Linux kernel mandates here.

My mistake was that I haven't pushed back hard enough on the initial design resulting in numerous cycles of trying to save the design while band aiding the flaws which became obvious after a while.

I haven't counted them but I think we are now already had over 10 patches which try to work around lifetime issues of the job object because I wasn't able to properly explain why this isn't going to work like this.

Because of this I will hard reject any attempt to band aid this issue even more which isn't starting over again with a design which looks like it is going to work.

Regards,
Christian.

Am 26.03.21 um 12:21 schrieb Liu, Monk:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian
>
> This is not correct or correct perspective, any design comes with its 
> pros and cons, otherwise it wouldn't comes to kernel tree in the very 
> beginning , it is just with time passed we have more and more 
> requirement and feature need to implement And those new requirement 
> drags many new solution or idea, and some idea you prefer need to 
> based on a new infrastructure, that's all
>
> I don't why the job "should be" or not "should be" in the scheduler, 
> honestly speaking I can argue with you that the "scheduler" and the TDR feature which invented by AMD developer "should" never escalate to drm layer at all and by that assumption Those vendor's compatibilities headache right now won't happen at all.
>
> Let's just focus on the issue so far.
>
> The solution Andrey and Jack doing right now looks good to me, and it 
> can solve our problems without introducing regression from a surface 
> look, but it is fine if you need a neat solution,  since we have our 
> project pressure (which we always have) Either we implement the first 
> version with Jack's patch and do the revise in another series of 
> patches (that also my initial suggestion) or we rework anything you 
> mentioned, but since looks to me you are from time to time asking 
> people to rework Something in the stage that people already have a 
> solution, which frustrated people a lot,
>
> I would like you do prepare a solution for us, which solves our 
> headaches ...  I really don't want to see you asked Jack to rework again and again If you are out of bandwidth or no interest in doing this ,please at least make your solution/proposal very detail and clear, jack told me he couldn't understand your point here.
>
> Thanks very much, and please understand our painful here
>
> /Monk
>
>
> -----邮件原件-----
> 发件人: Koenig, Christian <Christian.Koenig@amd.com>
> 发送时间: 2021年3月26日 17:06
> 收件人: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey 
> <Andrey.Grodzovsky@amd.com>; Christian König 
> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; 
> amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Deng, 
> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu 
> Vizoso <tomeu.vizoso@collabora.com>; Steven Price 
> <steven.price@arm.com>
> 主题: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
> memleak
>
> Hi guys,
>
> Am 26.03.21 um 03:23 schrieb Zhang, Jack (Jian):
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi, Andrey,
>>
>>>> how u handle non guilty singnaled jobs in drm_sched_stop, currently 
>>>> looks like you don't call put for them and just explicitly free 
>>>> them as before
>> Good point, I missed that place. Will cover that in my next patch.
>>
>>>> Also sched->free_guilty seems useless with the new approach.
>> Yes, I agree.
>>
>>>> Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>> I am not quite sure about that for now, let me think about this topic today.
>>
>> Hi, Christian,
>> should I add a fence and get/put to that fence rather than using an explicit refcount?
>> And another concerns?
> well let me re-iterate:
>
> For the scheduler the job is just a temporary data structure used for scheduling the IBs to the hardware.
>
> While pushing the job to the hardware we get a fence structure in return which represents the IBs executing on the hardware.
>
> Unfortunately we have applied a design where the job structure is rather used for re-submitting the jobs to the hardware after a GPU reset and karma handling etc etc...
>
> All that shouldn't have been pushed into the scheduler into the first place and we should now work on getting this cleaned up rather than making it an even bigger mess by applying halve backed solutions.
>
> So in my opinion adding a reference count to the job is going into the completely wrong directly. What we should rather do is to fix the incorrect design decision to use jobs as vehicle in the scheduler for reset handling.
>
> To fix this I suggest the following approach:
> 1. We add a pointer from the drm_sched_fence back to the drm_sched_job.
> 2. Instead of keeping the job around in the scheduler we keep the fence around. For this I suggest to replace the pending_list with a ring buffer.
> 3. The timedout_job callback is replaced with a timeout_fence callback.
> 4. The free_job callback is completed dropped. Job lifetime is now handled in the driver, not the scheduler.
>
> Regards,
> Christian.
>
>> Thanks,
>> Jack
>>
>> -----Original Message-----
>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Sent: Friday, March 26, 2021 12:32 AM
>> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König 
>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; 
>> amd-gfx@lists.freedesktop.org; Koenig, Christian 
>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily 
>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso 
>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid 
>> memleak
>>
>> There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>>
>> But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.
>>
>> Christian - can you chime in here ?
>>
>> Andrey
>>
>> On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>>
>>> Hi, Andrey
>>>
>>> Thank you for your good opinions.
>>>
>>> I literally agree with you that the refcount could solve the 
>>> get_clean_up_up cocurrent job gracefully, and no need to re-insert 
>>> the
>>>
>>> job back anymore.
>>>
>>> I quickly made a draft for this idea as follows:
>>>
>>> How do you like it? I will start implement to it after I got your 
>>> acknowledge.
>>>
>>> Thanks,
>>>
>>> Jack
>>>
>>> +void drm_job_get(struct drm_sched_job *s_job)
>>>
>>> +{
>>>
>>> +       kref_get(&s_job->refcount);
>>>
>>> +}
>>>
>>> +
>>>
>>> +void drm_job_do_release(struct kref *ref)
>>>
>>> +{
>>>
>>> +       struct drm_sched_job *s_job;
>>>
>>> +       struct drm_gpu_scheduler *sched;
>>>
>>> +
>>>
>>> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>>>
>>> +       sched = s_job->sched;
>>>
>>> +       sched->ops->free_job(s_job);
>>>
>>> +}
>>>
>>> +
>>>
>>> +void drm_job_put(struct drm_sched_job *s_job)
>>>
>>> +{
>>>
>>> +       kref_put(&s_job->refcount, drm_job_do_release);
>>>
>>> +}
>>>
>>> +
>>>
>>> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>
>>> {
>>>
>>>            struct drm_gpu_scheduler *sched = s_job->sched;
>>>
>>> +       kref_init(&s_job->refcount);
>>>
>>> +       drm_job_get(s_job);
>>>
>>>            spin_lock(&sched->job_list_lock);
>>>
>>>            list_add_tail(&s_job->node, &sched->ring_mirror_list);
>>>
>>>            drm_sched_start_timeout(sched);
>>>
>>> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>
>>>                     * drm_sched_cleanup_jobs. It will be reinserted 
>>> back after sched->thread
>>>
>>>                     * is parked at which point it's safe.
>>>
>>>                     */
>>>
>>> -               list_del_init(&job->node);
>>>
>>> +               drm_job_get(job);
>>>
>>>                    spin_unlock(&sched->job_list_lock);
>>>
>>>                    job->sched->ops->timedout_job(job);
>>>
>>> -
>>>
>>> +               drm_job_put(job);
>>>
>>>                    /*
>>>
>>>                     * Guilty job did complete and hence needs to be 
>>> manually removed
>>>
>>>                     * See drm_sched_stop doc.
>>>
>>>                     */
>>>
>>>                    if (sched->free_guilty) {
>>>
>>> -                       job->sched->ops->free_job(job);
>>>
>>>                            sched->free_guilty = false;
>>>
>>>                    }
>>>
>>>            } else {
>>>
>>> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>> *sched, struct drm_sched_job *bad)
>>>
>>> -       /*
>>>
>>> -        * Reinsert back the bad job here - now it's safe as
>>>
>>> -        * drm_sched_get_cleanup_job cannot race against us and
>>> release the
>>>
>>> -        * bad job at this point - we parked (waited for) any in
>>> progress
>>>
>>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
>>> be called
>>>
>>> -        * now until the scheduler thread is unparked.
>>>
>>> -        */
>>>
>>> -       if (bad && bad->sched == sched)
>>>
>>> -               /*
>>>
>>> -                * Add at the head of the queue to reflect it was the
>>> earliest
>>>
>>> -                * job extracted.
>>>
>>> -                */
>>>
>>> -               list_add(&bad->node, &sched->ring_mirror_list);
>>>
>>> -
>>>
>>>            /*
>>>
>>>             * Iterate the job list from later to  earlier one and 
>>> either deactive
>>>
>>>             * their HW callbacks or remove them from mirror list if 
>>> they already
>>>
>>> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>>>
>>>                                             kthread_should_stop());
>>>
>>>                    if (cleanup_job) {
>>>
>>> -                       sched->ops->free_job(cleanup_job);
>>>
>>> +                       drm_job_put(cleanup_job);
>>>
>>>                            /* queue timeout for next job */
>>>
>>>                            drm_sched_start_timeout(sched);
>>>
>>>                    }
>>>
>>> diff --git a/include/drm/gpu_scheduler.h 
>>> b/include/drm/gpu_scheduler.h
>>>
>>> index 5a1f068af1c2..b80513eec90f 100644
>>>
>>> --- a/include/drm/gpu_scheduler.h
>>>
>>> +++ b/include/drm/gpu_scheduler.h
>>>
>>> @@ -188,6 +188,7 @@ struct drm_sched_fence 
>>> *to_drm_sched_fence(struct dma_fence *f);
>>>
>>>      * to schedule the job.
>>>
>>>      */
>>>
>>> struct drm_sched_job {
>>>
>>> +       struct kref                     refcount;
>>>
>>>            struct spsc_node                queue_node;
>>>
>>>            struct drm_gpu_scheduler        *sched;
>>>
>>>            struct drm_sched_fence          *s_fence;
>>>
>>> @@ -198,6 +199,7 @@ struct drm_sched_job {
>>>
>>>            enum drm_sched_priority         s_priority;
>>>
>>>            struct drm_sched_entity  *entity;
>>>
>>>            struct dma_fence_cb             cb;
>>>
>>> +
>>>
>>> };
>>>
>>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> *Sent:* Friday, March 19, 2021 12:17 AM
>>> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König 
>>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; 
>>> amd-gfx@lists.freedesktop.org; Koenig, Christian 
>>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, 
>>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu 
>>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price 
>>> <steven.price@arm.com>
>>> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to 
>>> avoid memleak
>>>
>>> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       Hi, Andrey
>>>
>>>       Let me summarize the background of this patch:
>>>
>>>       In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>>>
>>>       It will submit first jobs of each ring and do guilty job re-check.
>>>
>>>       At that point, We had to make sure each job is in the mirror list(or
>>>       re-inserted back already).
>>>
>>>       But we found the current code never re-insert the job to mirror list
>>>       in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>>>
>>>       This not only will cause memleak of the bailing jobs. What’s more
>>>       important, the 1^st tdr thread can never iterate the bailing job and
>>>       set its guilty status to a correct status.
>>>
>>>       Therefore, we had to re-insert the job(or even not delete node) for
>>>       bailing job.
>>>
>>>       For the above V3 patch, the racing condition in my mind is:
>>>
>>>       we cannot make sure all bailing jobs are finished before we do
>>>       amdgpu_device_recheck_guilty_jobs.
>>>
>>> Yes,that race i missed - so you say that for 2nd, baling thread who 
>>> extracted the job, even if he reinsert it right away back after 
>>> driver callback return DRM_GPU_SCHED_STAT_BAILING, there is small 
>>> time slot where the job is not in mirror list and so the 1st TDR 
>>> might miss it and not find that  2nd job is the actual guilty job, 
>>> right ? But, still this job will get back into mirror list, and 
>>> since it's really the bad job, it will never signal completion and 
>>> so on the next timeout cycle it will be caught (of course there is a 
>>> starvation scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>>>
>>>       Based on this insight, I think we have two options to solve this issue:
>>>
>>>        1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>>>           atomic variable)
>>>        2. Re-insert back bailing job, and meanwhile use semaphore in each
>>>           tdr thread to keep the sequence as expected and ensure each job
>>>           is in the mirror list when do resubmit step.
>>>
>>>       For Option1, logic is simpler and we need only one global atomic
>>>       variable:
>>>
>>>       What do you think about this plan?
>>>
>>>       Option1 should look like the following logic:
>>>
>>>       +static atomic_t in_reset;             //a global atomic var for
>>>       synchronization
>>>
>>>       static void drm_sched_process_job(struct dma_fence *f, struct
>>>       dma_fence_cb *cb);
>>>
>>>         /**
>>>
>>>       @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>>>       work_struct *work)
>>>
>>>                         * drm_sched_cleanup_jobs. It will be reinserted
>>>       back after sched->thread
>>>
>>>                         * is parked at which point it's safe.
>>>
>>>                         */
>>>
>>>       +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>>>       delete node if it’s thead1,2,3,….
>>>
>>>       +                       spin_unlock(&sched->job_list_lock);
>>>
>>>       +                       drm_sched_start_timeout(sched);
>>>
>>>       +                       return;
>>>
>>>       +               }
>>>
>>>       +
>>>
>>>                        list_del_init(&job->node);
>>>
>>>                        spin_unlock(&sched->job_list_lock);
>>>
>>>       @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>>>       work_struct *work)
>>>
>>>                spin_lock(&sched->job_list_lock);
>>>
>>>                drm_sched_start_timeout(sched);
>>>
>>>                spin_unlock(&sched->job_list_lock);
>>>
>>>       +       atomic_set(&in_reset, 0); //reset in_reset when the first
>>>       thread finished tdr
>>>
>>>       }
>>>
>>> Technically looks like it should work as you don't access the job 
>>> pointer any longer and so no risk that if signaled it will be freed 
>>> by drm_sched_get_cleanup_job but,you can't just use one global 
>>> variable an by this bailing from TDR when different drivers run 
>>> their TDR threads in parallel, and even for amdgpu, if devices in 
>>> different XGMI hives or 2 independent devices in non XGMI setup. 
>>> There should be defined some kind of GPU reset group structure on 
>>> drm_scheduler level for which this variable would be used.
>>>
>>> P.S I wonder why we can't just ref-count the job so that even if 
>>> drm_sched_get_cleanup_job would delete it before we had a chance to 
>>> stop the scheduler thread, we wouldn't crash. This would avoid all 
>>> the dance with deletion and reinsertion.
>>>
>>> Andrey
>>>
>>>       Thanks,
>>>
>>>       Jack
>>>
>>>       *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>>>       <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>>>       Jack (Jian)
>>>       *Sent:* Wednesday, March 17, 2021 11:11 PM
>>>       *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>>>       <mailto:ckoenig.leichtzumerken@gmail.com>;
>>>       dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org
>>>       <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>>>       <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>>>       Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>>>       <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>>>       <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>>>       <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>>>       Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>>>       Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>>       <mailto:Andrey.Grodzovsky@amd.com>
>>>       *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid memleak
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       [AMD Official Use Only - Internal Distribution Only]
>>>
>>>       Hi,Andrey,
>>>
>>>       Good catch,I will expore this corner case and give feedback
>>> soon~
>>>
>>>       Best,
>>>
>>>       Jack
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --
>>>
>>>       *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>>>       <mailto:Andrey.Grodzovsky@amd.com>>
>>>       *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>>>       *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>>>       <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>>>       <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>       dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>
>>>       <dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>       <amd-gfx@lists.freedesktop.org
>>>       <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>>>       <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>>>       Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>>>       <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>>>       <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>>>       <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>>>       Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>       *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid memleak
>>>
>>>       I actually have a race condition concern here - see bellow -
>>>
>>>       On 2021-03-17 3:43 a.m., Christian König wrote:
>>>        > I was hoping Andrey would take a look since I'm really busy with
>>>       other
>>>        > work right now.
>>>        >
>>>        > Regards,
>>>        > Christian.
>>>        >
>>>        > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>>>        >> Hi, Andrey/Crhistian and Team,
>>>        >>
>>>        >> I didn't receive the reviewer's message from maintainers on
>>>       panfrost
>>>        >> driver for several days.
>>>        >> Due to this patch is urgent for my current working project.
>>>        >> Would you please help to give some review ideas?
>>>        >>
>>>        >> Many Thanks,
>>>        >> Jack
>>>        >> -----Original Message-----
>>>        >> From: Zhang, Jack (Jian)
>>>        >> Sent: Tuesday, March 16, 2021 3:20 PM
>>>        >> To: dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>       Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>        >> Vizoso <tomeu.vizoso@collabora.com
>>>       <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>       <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>        >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid
>>>        >> memleak
>>>        >>
>>>        >> [AMD Public Use]
>>>        >>
>>>        >> Ping
>>>        >>
>>>        >> -----Original Message-----
>>>        >> From: Zhang, Jack (Jian)
>>>        >> Sent: Monday, March 15, 2021 1:24 PM
>>>        >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>        >> dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>       Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>        >> Vizoso <tomeu.vizoso@collabora.com
>>>       <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>       <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>        >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>       avoid
>>>        >> memleak
>>>        >>
>>>        >> [AMD Public Use]
>>>        >>
>>>        >> Hi, Rob/Tomeu/Steven,
>>>        >>
>>>        >> Would you please help to review this patch for panfrost driver?
>>>        >>
>>>        >> Thanks,
>>>        >> Jack Zhang
>>>        >>
>>>        >> -----Original Message-----
>>>        >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>>>        >> Sent: Monday, March 15, 2021 1:21 PM
>>>        >> To: dri-devel@lists.freedesktop.org
>>>       <mailto:dri-devel@lists.freedesktop.org>;
>>>       amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>        >> Koenig, Christian <Christian.Koenig@amd.com
>>>       <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>        >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>       Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>        >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>>>        >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>>>       <mailto:Jack.Zhang1@amd.com>>
>>>        >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>>       memleak
>>>        >>
>>>        >> re-insert Bailing jobs to avoid memory leak.
>>>        >>
>>>        >> V2: move re-insert step to drm/scheduler logic
>>>        >> V3: add panfrost's return value for bailing jobs in case it hits
>>>       the
>>>        >> memleak issue.
>>>        >>
>>>        >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>>>       <mailto:Jack.Zhang1@amd.com>>
>>>        >> ---
>>>        >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>>        >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>>        >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>>        >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>>        >>   include/drm/gpu_scheduler.h                | 1 +
>>>        >>   5 files changed, 19 insertions(+), 6 deletions(-)
>>>        >>
>>>        >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> index 79b9cc73763f..86463b0f936e 100644
>>>        >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>        >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>>>        >> amdgpu_device *adev,
>>>        >>                       job ? job->base.id : -1);
>>>        >>             /* even we skipped this reset, still need to set the
>>>       job
>>>        >> to guilty */
>>>        >> -        if (job)
>>>        >> +        if (job) {
>>>        >>               drm_sched_increase_karma(&job->base);
>>>        >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>>>        >> +        }
>>>        >>           goto skip_recovery;
>>>        >>       }
>>>        >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> index 759b34799221..41390bdacd9e 100644
>>>        >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>        >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>>>        >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>        >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>>        >>       struct amdgpu_task_info ti;
>>>        >>       struct amdgpu_device *adev = ring->adev;
>>>        >> +    int ret;
>>>        >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>        >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>>>        >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>        >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>        >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>>        >> -        amdgpu_device_gpu_recover(ring->adev, job);
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>>>        >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>>>        >> +            return DRM_GPU_SCHED_STAT_BAILING;
>>>        >> +        else
>>>        >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >>       } else {
>>>        >>           drm_sched_suspend_timeout(&ring->sched);
>>>        >>           if (amdgpu_sriov_vf(adev))
>>>        >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> index 6003cfeb1322..e2cb4f32dae1 100644
>>>        >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>        >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>>>        >> panfrost_job_timedout(struct drm_sched_job
>>>        >>        * spurious. Bail out.
>>>        >>        */
>>>        >>       if (dma_fence_is_signaled(job->done_fence))
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>        >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>>>        >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>>        >>           js,
>>>        >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>>>        >> panfrost_job_timedout(struct drm_sched_job
>>>        >>         /* Scheduler is already stopped, nothing to do. */
>>>        >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>>>       sched_job))
>>>        >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>        >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>        >>         /* Schedule a reset if there's no reset in progress. */
>>>        >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>>>        >> a/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> b/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> index 92d8de24d0a1..a44f621fb5c4 100644
>>>        >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>        >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>>>        >> work_struct *work)  {
>>>        >>       struct drm_gpu_scheduler *sched;
>>>        >>       struct drm_sched_job *job;
>>>        >> +    int ret;
>>>        >>         sched = container_of(work, struct drm_gpu_scheduler,
>>>        >> work_tdr.work);
>>>        >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>>>        >> work_struct *work)
>>>        >>           list_del_init(&job->list);
>>>        >>           spin_unlock(&sched->job_list_lock);
>>>        >>   -        job->sched->ops->timedout_job(job);
>>>        >> +        ret = job->sched->ops->timedout_job(job);
>>>        >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>>>        >> +            spin_lock(&sched->job_list_lock);
>>>        >> +            list_add(&job->node, &sched->ring_mirror_list);
>>>        >> +            spin_unlock(&sched->job_list_lock);
>>>        >> +        }
>>>
>>>
>>>       At this point we don't hold GPU reset locks anymore, and so we could
>>>       be racing against another TDR thread from another scheduler ring of
>>>       same
>>>       device
>>>       or another XGMI hive member. The other thread might be in the middle of
>>>       luckless
>>>       iteration of mirror list (drm_sched_stop, drm_sched_start and
>>>       drm_sched_resubmit)
>>>       and so locking job_list_lock will not help. Looks like it's required to
>>>       take all GPU rest locks
>>>       here.
>>>
>>>       Andrey
>>>
>>>
>>>        >>           /*
>>>        >>            * Guilty job did complete and hence needs to be manually
>>>        >> removed
>>>        >>            * See drm_sched_stop doc.
>>>        >> diff --git a/include/drm/gpu_scheduler.h
>>>        >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>>>       100644
>>>        >> --- a/include/drm/gpu_scheduler.h
>>>        >> +++ b/include/drm/gpu_scheduler.h
>>>        >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>>        >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>>        >>       DRM_GPU_SCHED_STAT_NOMINAL,
>>>        >>       DRM_GPU_SCHED_STAT_ENODEV,
>>>        >> +    DRM_GPU_SCHED_STAT_BAILING,
>>>        >>   };
>>>        >>     /**
>>>        >> --
>>>        >> 2.25.1
>>>        >> _______________________________________________
>>>        >> amd-gfx mailing list
>>>        >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>        >>
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>> s 
>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7
>>> C
>>> Andrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd
>>> 8
>>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%
>>> 7 
>>> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>>> X 
>>> VCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSd
>>> V
>>> qL6xi0%3D&amp;reserved=0
>>>
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>> i
>>> s
>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
>>> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e
>>> 6
>>> 0
>>> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3
>>> d
>>> 8
>>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>>> C
>>> 2
>>> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserve
>>> d
>>> =
>>> 0>
>>>
>>>        >>
>>>        >
>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
  2021-03-30  3:10                             ` Liu, Monk
@ 2021-03-30  6:59                               ` Christian König
  0 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2021-03-30  6:59 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian, Zhang, Jack (Jian),
	Grodzovsky, Andrey, dri-devel, amd-gfx, Deng, Emily, Rob Herring,
	Tomeu Vizoso, Steven Price
  Cc: Zhang, Andy, Jiang, Jerry (SW)

Hi Monk,

yeah, that's what I can certainly agree on.

My primary concern is that I'm not convinced that we don't get problems 
at other places if we just add another band aid.

We already had this back and forth multiple times now and while we are 
currently under time pressure we will be under even more time pressure 
when a customer is running into other issues and we are still circling 
around the same fundamental problem.

Regards,
Christian.

Am 30.03.21 um 05:10 schrieb Liu, Monk:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian,
>
> We don't need to debate on the design's topic, each of us have our own opinion, it is hard to persuade others sometimes, again with more and more features and requirements it is pretty normal that an old design need to
> Refine and or even rework to satisfy all those needs, so I'm not trying to argue with you that we don't need a better rework, that's also pleasure me .
>
> In the moment, the more important thing I care is the solution because SRIOV project still try best to put all changes into upstreaming tree, we don't want to fork another tree unless no choice ...
>
> Let's have a sync in another thread
>
> Thanks for you help on this
>
> ------------------------------------------
> Monk Liu | Cloud-GPU Core team
> ------------------------------------------
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Friday, March 26, 2021 10:51 PM
> To: Liu, Monk <Monk.Liu@amd.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Deng, Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
> Cc: Zhang, Andy <Andy.Zhang@amd.com>; Jiang, Jerry (SW) <Jerry.Jiang@amd.com>
> Subject: Re: 回复: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
>
> Hi Monk,
>
> I can't disagree more.
>
> The fundamental problem here is that we have pushed a design without validating if it really fits into the concepts the Linux kernel mandates here.
>
> My mistake was that I haven't pushed back hard enough on the initial design resulting in numerous cycles of trying to save the design while band aiding the flaws which became obvious after a while.
>
> I haven't counted them but I think we are now already had over 10 patches which try to work around lifetime issues of the job object because I wasn't able to properly explain why this isn't going to work like this.
>
> Because of this I will hard reject any attempt to band aid this issue even more which isn't starting over again with a design which looks like it is going to work.
>
> Regards,
> Christian.
>
> Am 26.03.21 um 12:21 schrieb Liu, Monk:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Christian
>>
>> This is not correct or correct perspective, any design comes with its
>> pros and cons, otherwise it wouldn't comes to kernel tree in the very
>> beginning , it is just with time passed we have more and more
>> requirement and feature need to implement And those new requirement
>> drags many new solution or idea, and some idea you prefer need to
>> based on a new infrastructure, that's all
>>
>> I don't why the job "should be" or not "should be" in the scheduler,
>> honestly speaking I can argue with you that the "scheduler" and the TDR feature which invented by AMD developer "should" never escalate to drm layer at all and by that assumption Those vendor's compatibilities headache right now won't happen at all.
>>
>> Let's just focus on the issue so far.
>>
>> The solution Andrey and Jack doing right now looks good to me, and it
>> can solve our problems without introducing regression from a surface
>> look, but it is fine if you need a neat solution,  since we have our
>> project pressure (which we always have) Either we implement the first
>> version with Jack's patch and do the revise in another series of
>> patches (that also my initial suggestion) or we rework anything you
>> mentioned, but since looks to me you are from time to time asking
>> people to rework Something in the stage that people already have a
>> solution, which frustrated people a lot,
>>
>> I would like you do prepare a solution for us, which solves our
>> headaches ...  I really don't want to see you asked Jack to rework again and again If you are out of bandwidth or no interest in doing this ,please at least make your solution/proposal very detail and clear, jack told me he couldn't understand your point here.
>>
>> Thanks very much, and please understand our painful here
>>
>> /Monk
>>
>>
>> -----邮件原件-----
>> 发件人: Koenig, Christian <Christian.Koenig@amd.com>
>> 发送时间: 2021年3月26日 17:06
>> 收件人: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky@amd.com>; Christian König
>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>> amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Deng,
>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu
>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price
>> <steven.price@arm.com>
>> 主题: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>> memleak
>>
>> Hi guys,
>>
>> Am 26.03.21 um 03:23 schrieb Zhang, Jack (Jian):
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Hi, Andrey,
>>>
>>>>> how u handle non guilty singnaled jobs in drm_sched_stop, currently
>>>>> looks like you don't call put for them and just explicitly free
>>>>> them as before
>>> Good point, I missed that place. Will cover that in my next patch.
>>>
>>>>> Also sched->free_guilty seems useless with the new approach.
>>> Yes, I agree.
>>>
>>>>> Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>>> I am not quite sure about that for now, let me think about this topic today.
>>>
>>> Hi, Christian,
>>> should I add a fence and get/put to that fence rather than using an explicit refcount?
>>> And another concerns?
>> well let me re-iterate:
>>
>> For the scheduler the job is just a temporary data structure used for scheduling the IBs to the hardware.
>>
>> While pushing the job to the hardware we get a fence structure in return which represents the IBs executing on the hardware.
>>
>> Unfortunately we have applied a design where the job structure is rather used for re-submitting the jobs to the hardware after a GPU reset and karma handling etc etc...
>>
>> All that shouldn't have been pushed into the scheduler into the first place and we should now work on getting this cleaned up rather than making it an even bigger mess by applying halve backed solutions.
>>
>> So in my opinion adding a reference count to the job is going into the completely wrong directly. What we should rather do is to fix the incorrect design decision to use jobs as vehicle in the scheduler for reset handling.
>>
>> To fix this I suggest the following approach:
>> 1. We add a pointer from the drm_sched_fence back to the drm_sched_job.
>> 2. Instead of keeping the job around in the scheduler we keep the fence around. For this I suggest to replace the pending_list with a ring buffer.
>> 3. The timedout_job callback is replaced with a timeout_fence callback.
>> 4. The free_job callback is completed dropped. Job lifetime is now handled in the driver, not the scheduler.
>>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>> Jack
>>>
>>> -----Original Message-----
>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> Sent: Friday, March 26, 2021 12:32 AM
>>> To: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng, Emily
>>> <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu Vizoso
>>> <tomeu.vizoso@collabora.com>; Steven Price <steven.price@arm.com>
>>> Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>> memleak
>>>
>>> There are a few issues here like - how u handle non guilty singnaled jobs in drm_sched_stop, currently looks like you don't call put for them and just explicitly free them as before. Also sched->free_guilty seems useless with the new approach. Do we even need the cleanup mechanism at drm_sched_get_cleanup_job with this approach...
>>>
>>> But first - We need Christian to express his opinion on this since I think he opposed refcounting jobs and that we should concentrate on fences instead.
>>>
>>> Christian - can you chime in here ?
>>>
>>> Andrey
>>>
>>> On 2021-03-25 5:51 a.m., Zhang, Jack (Jian) wrote:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>
>>>> Hi, Andrey
>>>>
>>>> Thank you for your good opinions.
>>>>
>>>> I literally agree with you that the refcount could solve the
>>>> get_clean_up_up cocurrent job gracefully, and no need to re-insert
>>>> the
>>>>
>>>> job back anymore.
>>>>
>>>> I quickly made a draft for this idea as follows:
>>>>
>>>> How do you like it? I will start implement to it after I got your
>>>> acknowledge.
>>>>
>>>> Thanks,
>>>>
>>>> Jack
>>>>
>>>> +void drm_job_get(struct drm_sched_job *s_job)
>>>>
>>>> +{
>>>>
>>>> +       kref_get(&s_job->refcount);
>>>>
>>>> +}
>>>>
>>>> +
>>>>
>>>> +void drm_job_do_release(struct kref *ref)
>>>>
>>>> +{
>>>>
>>>> +       struct drm_sched_job *s_job;
>>>>
>>>> +       struct drm_gpu_scheduler *sched;
>>>>
>>>> +
>>>>
>>>> +       s_job = container_of(ref, struct drm_sched_job, refcount);
>>>>
>>>> +       sched = s_job->sched;
>>>>
>>>> +       sched->ops->free_job(s_job);
>>>>
>>>> +}
>>>>
>>>> +
>>>>
>>>> +void drm_job_put(struct drm_sched_job *s_job)
>>>>
>>>> +{
>>>>
>>>> +       kref_put(&s_job->refcount, drm_job_do_release);
>>>>
>>>> +}
>>>>
>>>> +
>>>>
>>>> static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>
>>>> {
>>>>
>>>>             struct drm_gpu_scheduler *sched = s_job->sched;
>>>>
>>>> +       kref_init(&s_job->refcount);
>>>>
>>>> +       drm_job_get(s_job);
>>>>
>>>>             spin_lock(&sched->job_list_lock);
>>>>
>>>>             list_add_tail(&s_job->node, &sched->ring_mirror_list);
>>>>
>>>>             drm_sched_start_timeout(sched);
>>>>
>>>> @@ -294,17 +316,16 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>
>>>>                      * drm_sched_cleanup_jobs. It will be reinserted
>>>> back after sched->thread
>>>>
>>>>                      * is parked at which point it's safe.
>>>>
>>>>                      */
>>>>
>>>> -               list_del_init(&job->node);
>>>>
>>>> +               drm_job_get(job);
>>>>
>>>>                     spin_unlock(&sched->job_list_lock);
>>>>
>>>>                     job->sched->ops->timedout_job(job);
>>>>
>>>> -
>>>>
>>>> +               drm_job_put(job);
>>>>
>>>>                     /*
>>>>
>>>>                      * Guilty job did complete and hence needs to be
>>>> manually removed
>>>>
>>>>                      * See drm_sched_stop doc.
>>>>
>>>>                      */
>>>>
>>>>                     if (sched->free_guilty) {
>>>>
>>>> -                       job->sched->ops->free_job(job);
>>>>
>>>>                             sched->free_guilty = false;
>>>>
>>>>                     }
>>>>
>>>>             } else {
>>>>
>>>> @@ -355,20 +376,6 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>> *sched, struct drm_sched_job *bad)
>>>>
>>>> -       /*
>>>>
>>>> -        * Reinsert back the bad job here - now it's safe as
>>>>
>>>> -        * drm_sched_get_cleanup_job cannot race against us and
>>>> release the
>>>>
>>>> -        * bad job at this point - we parked (waited for) any in
>>>> progress
>>>>
>>>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not
>>>> be called
>>>>
>>>> -        * now until the scheduler thread is unparked.
>>>>
>>>> -        */
>>>>
>>>> -       if (bad && bad->sched == sched)
>>>>
>>>> -               /*
>>>>
>>>> -                * Add at the head of the queue to reflect it was the
>>>> earliest
>>>>
>>>> -                * job extracted.
>>>>
>>>> -                */
>>>>
>>>> -               list_add(&bad->node, &sched->ring_mirror_list);
>>>>
>>>> -
>>>>
>>>>             /*
>>>>
>>>>              * Iterate the job list from later to  earlier one and
>>>> either deactive
>>>>
>>>>              * their HW callbacks or remove them from mirror list if
>>>> they already
>>>>
>>>> @@ -774,7 +781,7 @@ static int drm_sched_main(void *param)
>>>>
>>>>                                              kthread_should_stop());
>>>>
>>>>                     if (cleanup_job) {
>>>>
>>>> -                       sched->ops->free_job(cleanup_job);
>>>>
>>>> +                       drm_job_put(cleanup_job);
>>>>
>>>>                             /* queue timeout for next job */
>>>>
>>>>                             drm_sched_start_timeout(sched);
>>>>
>>>>                     }
>>>>
>>>> diff --git a/include/drm/gpu_scheduler.h
>>>> b/include/drm/gpu_scheduler.h
>>>>
>>>> index 5a1f068af1c2..b80513eec90f 100644
>>>>
>>>> --- a/include/drm/gpu_scheduler.h
>>>>
>>>> +++ b/include/drm/gpu_scheduler.h
>>>>
>>>> @@ -188,6 +188,7 @@ struct drm_sched_fence
>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>
>>>>       * to schedule the job.
>>>>
>>>>       */
>>>>
>>>> struct drm_sched_job {
>>>>
>>>> +       struct kref                     refcount;
>>>>
>>>>             struct spsc_node                queue_node;
>>>>
>>>>             struct drm_gpu_scheduler        *sched;
>>>>
>>>>             struct drm_sched_fence          *s_fence;
>>>>
>>>> @@ -198,6 +199,7 @@ struct drm_sched_job {
>>>>
>>>>             enum drm_sched_priority         s_priority;
>>>>
>>>>             struct drm_sched_entity  *entity;
>>>>
>>>>             struct dma_fence_cb             cb;
>>>>
>>>> +
>>>>
>>>> };
>>>>
>>>> *From:* Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>>> *Sent:* Friday, March 19, 2021 12:17 AM
>>>> *To:* Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Christian König
>>>> <ckoenig.leichtzumerken@gmail.com>; dri-devel@lists.freedesktop.org;
>>>> amd-gfx@lists.freedesktop.org; Koenig, Christian
>>>> <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Deng,
>>>> Emily <Emily.Deng@amd.com>; Rob Herring <robh@kernel.org>; Tomeu
>>>> Vizoso <tomeu.vizoso@collabora.com>; Steven Price
>>>> <steven.price@arm.com>
>>>> *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>> avoid memleak
>>>>
>>>> On 2021-03-18 6:41 a.m., Zhang, Jack (Jian) wrote:
>>>>
>>>>        [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>        Hi, Andrey
>>>>
>>>>        Let me summarize the background of this patch:
>>>>
>>>>        In TDR resubmit step “amdgpu_device_recheck_guilty_jobs,
>>>>
>>>>        It will submit first jobs of each ring and do guilty job re-check.
>>>>
>>>>        At that point, We had to make sure each job is in the mirror list(or
>>>>        re-inserted back already).
>>>>
>>>>        But we found the current code never re-insert the job to mirror list
>>>>        in the 2^nd , 3^rd job_timeout thread(Bailing TDR thread).
>>>>
>>>>        This not only will cause memleak of the bailing jobs. What’s more
>>>>        important, the 1^st tdr thread can never iterate the bailing job and
>>>>        set its guilty status to a correct status.
>>>>
>>>>        Therefore, we had to re-insert the job(or even not delete node) for
>>>>        bailing job.
>>>>
>>>>        For the above V3 patch, the racing condition in my mind is:
>>>>
>>>>        we cannot make sure all bailing jobs are finished before we do
>>>>        amdgpu_device_recheck_guilty_jobs.
>>>>
>>>> Yes,that race i missed - so you say that for 2nd, baling thread who
>>>> extracted the job, even if he reinsert it right away back after
>>>> driver callback return DRM_GPU_SCHED_STAT_BAILING, there is small
>>>> time slot where the job is not in mirror list and so the 1st TDR
>>>> might miss it and not find that  2nd job is the actual guilty job,
>>>> right ? But, still this job will get back into mirror list, and
>>>> since it's really the bad job, it will never signal completion and
>>>> so on the next timeout cycle it will be caught (of course there is a
>>>> starvation scenario here if more TDRs kick in and it bails out again but this is really unlikely).
>>>>
>>>>        Based on this insight, I think we have two options to solve this issue:
>>>>
>>>>         1. Skip delete node in tdr thread2, thread3, 4 … (using mutex or
>>>>            atomic variable)
>>>>         2. Re-insert back bailing job, and meanwhile use semaphore in each
>>>>            tdr thread to keep the sequence as expected and ensure each job
>>>>            is in the mirror list when do resubmit step.
>>>>
>>>>        For Option1, logic is simpler and we need only one global atomic
>>>>        variable:
>>>>
>>>>        What do you think about this plan?
>>>>
>>>>        Option1 should look like the following logic:
>>>>
>>>>        +static atomic_t in_reset;             //a global atomic var for
>>>>        synchronization
>>>>
>>>>        static void drm_sched_process_job(struct dma_fence *f, struct
>>>>        dma_fence_cb *cb);
>>>>
>>>>          /**
>>>>
>>>>        @@ -295,6 +296,12 @@ static void drm_sched_job_timedout(struct
>>>>        work_struct *work)
>>>>
>>>>                          * drm_sched_cleanup_jobs. It will be reinserted
>>>>        back after sched->thread
>>>>
>>>>                          * is parked at which point it's safe.
>>>>
>>>>                          */
>>>>
>>>>        +               if (atomic_cmpxchg(&in_reset, 0, 1) != 0) {  //skip
>>>>        delete node if it’s thead1,2,3,….
>>>>
>>>>        +                       spin_unlock(&sched->job_list_lock);
>>>>
>>>>        +                       drm_sched_start_timeout(sched);
>>>>
>>>>        +                       return;
>>>>
>>>>        +               }
>>>>
>>>>        +
>>>>
>>>>                         list_del_init(&job->node);
>>>>
>>>>                         spin_unlock(&sched->job_list_lock);
>>>>
>>>>        @@ -320,6 +327,7 @@ static void drm_sched_job_timedout(struct
>>>>        work_struct *work)
>>>>
>>>>                 spin_lock(&sched->job_list_lock);
>>>>
>>>>                 drm_sched_start_timeout(sched);
>>>>
>>>>                 spin_unlock(&sched->job_list_lock);
>>>>
>>>>        +       atomic_set(&in_reset, 0); //reset in_reset when the first
>>>>        thread finished tdr
>>>>
>>>>        }
>>>>
>>>> Technically looks like it should work as you don't access the job
>>>> pointer any longer and so no risk that if signaled it will be freed
>>>> by drm_sched_get_cleanup_job but,you can't just use one global
>>>> variable an by this bailing from TDR when different drivers run
>>>> their TDR threads in parallel, and even for amdgpu, if devices in
>>>> different XGMI hives or 2 independent devices in non XGMI setup.
>>>> There should be defined some kind of GPU reset group structure on
>>>> drm_scheduler level for which this variable would be used.
>>>>
>>>> P.S I wonder why we can't just ref-count the job so that even if
>>>> drm_sched_get_cleanup_job would delete it before we had a chance to
>>>> stop the scheduler thread, we wouldn't crash. This would avoid all
>>>> the dance with deletion and reinsertion.
>>>>
>>>> Andrey
>>>>
>>>>        Thanks,
>>>>
>>>>        Jack
>>>>
>>>>        *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>>>>        <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Zhang,
>>>>        Jack (Jian)
>>>>        *Sent:* Wednesday, March 17, 2021 11:11 PM
>>>>        *To:* Christian König <ckoenig.leichtzumerken@gmail.com>
>>>>        <mailto:ckoenig.leichtzumerken@gmail.com>;
>>>>        dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>;
>>>>        amd-gfx@lists.freedesktop.org
>>>>        <mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian
>>>>        <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>; Liu,
>>>>        Monk <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com>; Deng, Emily
>>>>        <Emily.Deng@amd.com> <mailto:Emily.Deng@amd.com>; Rob Herring
>>>>        <robh@kernel.org> <mailto:robh@kernel.org>; Tomeu Vizoso
>>>>        <tomeu.vizoso@collabora.com> <mailto:tomeu.vizoso@collabora.com>;
>>>>        Steven Price <steven.price@arm.com> <mailto:steven.price@arm.com>;
>>>>        Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>>>        <mailto:Andrey.Grodzovsky@amd.com>
>>>>        *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>>        avoid memleak
>>>>
>>>>        [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>        [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>>        Hi,Andrey,
>>>>
>>>>        Good catch,I will expore this corner case and give feedback
>>>> soon~
>>>>
>>>>        Best,
>>>>
>>>>        Jack
>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> -
>>>> --
>>>>
>>>>        *From:*Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com
>>>>        <mailto:Andrey.Grodzovsky@amd.com>>
>>>>        *Sent:* Wednesday, March 17, 2021 10:50:59 PM
>>>>        *To:* Christian König <ckoenig.leichtzumerken@gmail.com
>>>>        <mailto:ckoenig.leichtzumerken@gmail.com>>; Zhang, Jack (Jian)
>>>>        <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>>        dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>
>>>>        <dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>>;
>>>>        amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>>        <amd-gfx@lists.freedesktop.org
>>>>        <mailto:amd-gfx@lists.freedesktop.org>>; Koenig, Christian
>>>>        <Christian.Koenig@amd.com <mailto:Christian.Koenig@amd.com>>; Liu,
>>>>        Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng, Emily
>>>>        <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob Herring
>>>>        <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu Vizoso
>>>>        <tomeu.vizoso@collabora.com <mailto:tomeu.vizoso@collabora.com>>;
>>>>        Steven Price <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>>        *Subject:* Re: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>>        avoid memleak
>>>>
>>>>        I actually have a race condition concern here - see bellow -
>>>>
>>>>        On 2021-03-17 3:43 a.m., Christian König wrote:
>>>>         > I was hoping Andrey would take a look since I'm really busy with
>>>>        other
>>>>         > work right now.
>>>>         >
>>>>         > Regards,
>>>>         > Christian.
>>>>         >
>>>>         > Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian):
>>>>         >> Hi, Andrey/Crhistian and Team,
>>>>         >>
>>>>         >> I didn't receive the reviewer's message from maintainers on
>>>>        panfrost
>>>>         >> driver for several days.
>>>>         >> Due to this patch is urgent for my current working project.
>>>>         >> Would you please help to give some review ideas?
>>>>         >>
>>>>         >> Many Thanks,
>>>>         >> Jack
>>>>         >> -----Original Message-----
>>>>         >> From: Zhang, Jack (Jian)
>>>>         >> Sent: Tuesday, March 16, 2021 3:20 PM
>>>>         >> To: dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>;
>>>>        amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>>         >> Koenig, Christian <Christian.Koenig@amd.com
>>>>        <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>>         >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>>        Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>>         >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>>        Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>>         >> Vizoso <tomeu.vizoso@collabora.com
>>>>        <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>>        <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>>         >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>>        avoid
>>>>         >> memleak
>>>>         >>
>>>>         >> [AMD Public Use]
>>>>         >>
>>>>         >> Ping
>>>>         >>
>>>>         >> -----Original Message-----
>>>>         >> From: Zhang, Jack (Jian)
>>>>         >> Sent: Monday, March 15, 2021 1:24 PM
>>>>         >> To: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>;
>>>>         >> dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>;
>>>>        amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>>         >> Koenig, Christian <Christian.Koenig@amd.com
>>>>        <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>>         >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>>        Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>>         >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>; Rob
>>>>        Herring <robh@kernel.org <mailto:robh@kernel.org>>; Tomeu
>>>>         >> Vizoso <tomeu.vizoso@collabora.com
>>>>        <mailto:tomeu.vizoso@collabora.com>>; Steven Price
>>>>        <steven.price@arm.com <mailto:steven.price@arm.com>>
>>>>         >> Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to
>>>>        avoid
>>>>         >> memleak
>>>>         >>
>>>>         >> [AMD Public Use]
>>>>         >>
>>>>         >> Hi, Rob/Tomeu/Steven,
>>>>         >>
>>>>         >> Would you please help to review this patch for panfrost driver?
>>>>         >>
>>>>         >> Thanks,
>>>>         >> Jack Zhang
>>>>         >>
>>>>         >> -----Original Message-----
>>>>         >> From: Jack Zhang <Jack.Zhang1@amd.com <mailto:Jack.Zhang1@amd.com>>
>>>>         >> Sent: Monday, March 15, 2021 1:21 PM
>>>>         >> To: dri-devel@lists.freedesktop.org
>>>>        <mailto:dri-devel@lists.freedesktop.org>;
>>>>        amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>;
>>>>         >> Koenig, Christian <Christian.Koenig@amd.com
>>>>        <mailto:Christian.Koenig@amd.com>>; Grodzovsky, Andrey
>>>>         >> <Andrey.Grodzovsky@amd.com <mailto:Andrey.Grodzovsky@amd.com>>;
>>>>        Liu, Monk <Monk.Liu@amd.com <mailto:Monk.Liu@amd.com>>; Deng,
>>>>         >> Emily <Emily.Deng@amd.com <mailto:Emily.Deng@amd.com>>
>>>>         >> Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com
>>>>        <mailto:Jack.Zhang1@amd.com>>
>>>>         >> Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
>>>>        memleak
>>>>         >>
>>>>         >> re-insert Bailing jobs to avoid memory leak.
>>>>         >>
>>>>         >> V2: move re-insert step to drm/scheduler logic
>>>>         >> V3: add panfrost's return value for bailing jobs in case it hits
>>>>        the
>>>>         >> memleak issue.
>>>>         >>
>>>>         >> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com
>>>>        <mailto:Jack.Zhang1@amd.com>>
>>>>         >> ---
>>>>         >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
>>>>         >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 8 ++++++--
>>>>         >>   drivers/gpu/drm/panfrost/panfrost_job.c    | 4 ++--
>>>>         >>   drivers/gpu/drm/scheduler/sched_main.c     | 8 +++++++-
>>>>         >>   include/drm/gpu_scheduler.h                | 1 +
>>>>         >>   5 files changed, 19 insertions(+), 6 deletions(-)
>>>>         >>
>>>>         >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>         >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>         >> index 79b9cc73763f..86463b0f936e 100644
>>>>         >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>         >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>         >> @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct
>>>>         >> amdgpu_device *adev,
>>>>         >>                       job ? job->base.id : -1);
>>>>         >>             /* even we skipped this reset, still need to set the
>>>>        job
>>>>         >> to guilty */
>>>>         >> -        if (job)
>>>>         >> +        if (job) {
>>>>         >>               drm_sched_increase_karma(&job->base);
>>>>         >> +            r = DRM_GPU_SCHED_STAT_BAILING;
>>>>         >> +        }
>>>>         >>           goto skip_recovery;
>>>>         >>       }
>>>>         >>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>         >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>         >> index 759b34799221..41390bdacd9e 100644
>>>>         >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>         >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>         >> @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat
>>>>         >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>         >>       struct amdgpu_job *job = to_amdgpu_job(s_job);
>>>>         >>       struct amdgpu_task_info ti;
>>>>         >>       struct amdgpu_device *adev = ring->adev;
>>>>         >> +    int ret;
>>>>         >>         memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>         >>   @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat
>>>>         >> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>         >>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>>         >>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>>>         >> -        amdgpu_device_gpu_recover(ring->adev, job);
>>>>         >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>>         >> +        ret = amdgpu_device_gpu_recover(ring->adev, job);
>>>>         >> +        if (ret == DRM_GPU_SCHED_STAT_BAILING)
>>>>         >> +            return DRM_GPU_SCHED_STAT_BAILING;
>>>>         >> +        else
>>>>         >> +            return DRM_GPU_SCHED_STAT_NOMINAL;
>>>>         >>       } else {
>>>>         >>           drm_sched_suspend_timeout(&ring->sched);
>>>>         >>           if (amdgpu_sriov_vf(adev))
>>>>         >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>         >> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>         >> index 6003cfeb1322..e2cb4f32dae1 100644
>>>>         >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>         >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>         >> @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat
>>>>         >> panfrost_job_timedout(struct drm_sched_job
>>>>         >>        * spurious. Bail out.
>>>>         >>        */
>>>>         >>       if (dma_fence_is_signaled(job->done_fence))
>>>>         >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>>         >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>>         >>         dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x,
>>>>         >> status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",
>>>>         >>           js,
>>>>         >> @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
>>>>         >> panfrost_job_timedout(struct drm_sched_job
>>>>         >>         /* Scheduler is already stopped, nothing to do. */
>>>>         >>       if (!panfrost_scheduler_stop(&pfdev->js->queue[js],
>>>>        sched_job))
>>>>         >> -        return DRM_GPU_SCHED_STAT_NOMINAL;
>>>>         >> +        return DRM_GPU_SCHED_STAT_BAILING;
>>>>         >>         /* Schedule a reset if there's no reset in progress. */
>>>>         >>       if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git
>>>>         >> a/drivers/gpu/drm/scheduler/sched_main.c
>>>>         >> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>         >> index 92d8de24d0a1..a44f621fb5c4 100644
>>>>         >> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>         >> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>         >> @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct
>>>>         >> work_struct *work)  {
>>>>         >>       struct drm_gpu_scheduler *sched;
>>>>         >>       struct drm_sched_job *job;
>>>>         >> +    int ret;
>>>>         >>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>         >> work_tdr.work);
>>>>         >>   @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct
>>>>         >> work_struct *work)
>>>>         >>           list_del_init(&job->list);
>>>>         >>           spin_unlock(&sched->job_list_lock);
>>>>         >>   -        job->sched->ops->timedout_job(job);
>>>>         >> +        ret = job->sched->ops->timedout_job(job);
>>>>         >>   +        if (ret == DRM_GPU_SCHED_STAT_BAILING) {
>>>>         >> +            spin_lock(&sched->job_list_lock);
>>>>         >> +            list_add(&job->node, &sched->ring_mirror_list);
>>>>         >> +            spin_unlock(&sched->job_list_lock);
>>>>         >> +        }
>>>>
>>>>
>>>>        At this point we don't hold GPU reset locks anymore, and so we could
>>>>        be racing against another TDR thread from another scheduler ring of
>>>>        same
>>>>        device
>>>>        or another XGMI hive member. The other thread might be in the middle of
>>>>        luckless
>>>>        iteration of mirror list (drm_sched_stop, drm_sched_start and
>>>>        drm_sched_resubmit)
>>>>        and so locking job_list_lock will not help. Looks like it's required to
>>>>        take all GPU rest locks
>>>>        here.
>>>>
>>>>        Andrey
>>>>
>>>>
>>>>         >>           /*
>>>>         >>            * Guilty job did complete and hence needs to be manually
>>>>         >> removed
>>>>         >>            * See drm_sched_stop doc.
>>>>         >> diff --git a/include/drm/gpu_scheduler.h
>>>>         >> b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef
>>>>        100644
>>>>         >> --- a/include/drm/gpu_scheduler.h
>>>>         >> +++ b/include/drm/gpu_scheduler.h
>>>>         >> @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
>>>>         >>       DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
>>>>         >>       DRM_GPU_SCHED_STAT_NOMINAL,
>>>>         >>       DRM_GPU_SCHED_STAT_ENODEV,
>>>>         >> +    DRM_GPU_SCHED_STAT_BAILING,
>>>>         >>   };
>>>>         >>     /**
>>>>         >> --
>>>>         >> 2.25.1
>>>>         >> _______________________________________________
>>>>         >> amd-gfx mailing list
>>>>         >> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
>>>>         >>
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>>> s
>>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7
>>>> C
>>>> Andrey.Grodzovsky%40amd.com%7Ce90f30af0f43444c6aea08d8e91860c4%7C3dd
>>>> 8
>>>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637515638213180413%7CUnknown%
>>>> 7
>>>> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>>>> X
>>>> VCI6Mn0%3D%7C1000&amp;sdata=NnLqtz%2BZ8%2BweYwCqRinrfkqmhzibNAF6CYSd
>>>> V
>>>> qL6xi0%3D&amp;reserved=0
>>>>
>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>>> i
>>>> s
>>>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJack.
>>>> Zhang1%40amd.com%7C95b2ff206ee74bbe520a08d8e956f5dd%7C3dd8961fe4884e
>>>> 6
>>>> 0
>>>> 8e11a82d994e183d%7C0%7C0%7C637515907000888939%7CUnknown%7CTWFpbGZsb3
>>>> d
>>>> 8
>>>> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>>>> C
>>>> 2
>>>> 000&sdata=BGoSfOYiDar8SrpMx%2BsOMWpaMr87bxB%2F9ycu0FhhipA%3D&reserve
>>>> d
>>>> =
>>>> 0>
>>>>
>>>>         >>
>>>>         >
>>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-03-30  6:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15  5:20 [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak Jack Zhang
2021-03-15  5:23 ` Zhang, Jack (Jian)
2021-03-16  7:19   ` Zhang, Jack (Jian)
2021-03-17  6:46     ` Zhang, Jack (Jian)
2021-03-17  7:43       ` Christian König
2021-03-17 14:50         ` Andrey Grodzovsky
2021-03-17 15:11           ` Zhang, Jack (Jian)
2021-03-18 10:41             ` Zhang, Jack (Jian)
2021-03-18 16:16               ` Andrey Grodzovsky
2021-03-25  9:51                 ` Zhang, Jack (Jian)
2021-03-25 16:32                   ` Andrey Grodzovsky
2021-03-26  2:23                     ` Zhang, Jack (Jian)
2021-03-26  9:05                       ` Christian König
2021-03-26 11:21                         ` 回复: " Liu, Monk
2021-03-26 14:51                           ` Christian König
2021-03-30  3:10                             ` Liu, Monk
2021-03-30  6:59                               ` Christian König
2021-03-22 15:29   ` Steven Price
2021-03-26  2:04     ` Zhang, Jack (Jian)
2021-03-26  9:07       ` Steven Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).