All of lore.kernel.org
 help / color / mirror / Atom feed
* Fixes for scheduler hang when killing a process
@ 2022-10-14  8:46 Christian König
  2022-10-14  8:46 ` [PATCH 01/13] drm/scheduler: fix fence ref counting Christian König
                   ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: Christian König @ 2022-10-14  8:46 UTC (permalink / raw)
  To: luben.tuikov, dri-devel, amd-gfx

Hi guys,

rebased those patches on top of amd-staging-drm-next since the
amdgpu changes are quite substencial.

Please review and comment,
Christian.



^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS
@ 2022-12-21 21:59 Bert Karwatzki
  0 siblings, 0 replies; 50+ messages in thread
From: Bert Karwatzki @ 2022-12-21 21:59 UTC (permalink / raw)
  To: amd-gfx; +Cc: Mike Lothian

Can you test if this solves the freezes:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 919bbea2e3ac..4e684c2afc70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1506,7 +1509,8 @@ u64 amdgpu_bo_gpu_offset_no_check(struct
amdgpu_bo *bo)
 uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
                                            uint32_t domain)
 {
-       if (domain == (AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT)) {
+       if ((domain == (AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT)) &&
+           ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type ==
CHIP_STONEY))) {
                domain = AMDGPU_GEM_DOMAIN_VRAM;
                if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
                        domain = AMDGPU_GEM_DOMAIN_GTT;
 

This solves a lot of seemingly unrelated errors:
https://gitlab.freedesktop.org/drm/amd/-/issues/2255
https://gitlab.freedesktop.org/drm/amd/-/issues/2270
https://gitlab.freedesktop.org/drm/amd/-/issues/2281
https://gitlab.freedesktop.org/drm/amd/-/issues/2282
https://gitlab.freedesktop.org/drm/amd/-/issues/2291

Bert Karwatzki


^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS
@ 2022-12-21 21:12 Bert Karwatzki
  0 siblings, 0 replies; 50+ messages in thread
From: Bert Karwatzki @ 2022-12-21 21:12 UTC (permalink / raw)
  To: amd-gfx; +Cc: Mike Lothian

Can you test if this solves the freezes:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 919bbea2e3ac..4e684c2afc70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1506,7 +1509,8 @@ u64 amdgpu_bo_gpu_offset_no_check(struct
amdgpu_bo *bo)
 uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
 					    uint32_t domain)
 {
-	if (domain == (AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT)) {
+	if ((domain == (AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT)) &&
+	    ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type ==
CHIP_STONEY))) {
 		domain = AMDGPU_GEM_DOMAIN_VRAM;
 		if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
 			domain = AMDGPU_GEM_DOMAIN_GTT;
 

This solves a lot of seemingly unrelated errors:
https://gitlab.freedesktop.org/drm/amd/-/issues/2255
https://gitlab.freedesktop.org/drm/amd/-/issues/2270
https://gitlab.freedesktop.org/drm/amd/-/issues/2281
https://gitlab.freedesktop.org/drm/amd/-/issues/2282
https://gitlab.freedesktop.org/drm/amd/-/issues/2291

Bert Karwatzki

^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PATCH 01/13] drm/scheduler: fix fence ref counting
@ 2022-09-29 13:21 Christian König
  2022-09-29 13:21 ` [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2022-09-29 13:21 UTC (permalink / raw)
  To: dri-devel
  Cc: shansheng.wang, Christian König, luben.tuikov, WenChieh.Chien

We leaked dependency fences when processes were beeing killed.

Additional to that grab a reference to the last scheduled fence.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 191c56064f19..1bb1437a8fed 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -207,6 +207,7 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
 						 finish_cb);
 
+	dma_fence_put(f);
 	init_irq_work(&job->work, drm_sched_entity_kill_jobs_irq_work);
 	irq_work_queue(&job->work);
 }
@@ -234,8 +235,10 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 		struct drm_sched_fence *s_fence = job->s_fence;
 
 		/* Wait for all dependencies to avoid data corruptions */
-		while ((f = drm_sched_job_dependency(job, entity)))
+		while ((f = drm_sched_job_dependency(job, entity))) {
 			dma_fence_wait(f, false);
+			dma_fence_put(f);
+		}
 
 		drm_sched_fence_scheduled(s_fence);
 		dma_fence_set_error(&s_fence->finished, -ESRCH);
@@ -250,6 +253,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 			continue;
 		}
 
+		dma_fence_get(entity->last_scheduled);
 		r = dma_fence_add_callback(entity->last_scheduled,
 					   &job->finish_cb,
 					   drm_sched_entity_kill_jobs_cb);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-01-03  8:32 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-14  8:46 Fixes for scheduler hang when killing a process Christian König
2022-10-14  8:46 ` [PATCH 01/13] drm/scheduler: fix fence ref counting Christian König
2022-10-25  3:23   ` Luna Nova
2022-10-25 11:35     ` Christian König
2022-10-14  8:46 ` [PATCH 02/13] drm/scheduler: add drm_sched_job_add_resv_dependencies Christian König
2022-10-14  8:46 ` [PATCH 03/13] drm/amdgpu: use drm_sched_job_add_resv_dependencies for moves Christian König
2022-10-14  8:46 ` [PATCH 04/13] drm/amdgpu: drop the fence argument from amdgpu_vmid_grab Christian König
2022-10-14  8:46 ` [PATCH 05/13] drm/amdgpu: drop amdgpu_sync " Christian König
2022-10-23  1:25   ` Luben Tuikov
2022-10-24 10:54     ` Christian König
2022-10-14  8:46 ` [PATCH 06/13] drm/amdgpu: cleanup scheduler job initialization Christian König
2022-10-23  1:50   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 07/13] drm/amdgpu: move explicit sync check into the CS Christian König
2022-10-14  8:46 ` [PATCH 08/13] drm/amdgpu: use scheduler depenencies for VM updates Christian König
2022-10-24  5:50   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 09/13] drm/amdgpu: use scheduler depenencies for UVD msgs Christian König
2022-10-24  5:53   ` Luben Tuikov
2022-10-14  8:46 ` [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS Christian König
2022-10-24  5:55   ` Luben Tuikov
2022-12-21 15:34   ` Mike Lothian
2022-12-21 15:47     ` Mike Lothian
2022-12-21 15:52     ` Luben Tuikov
2022-12-21 15:55       ` Mike Lothian
2022-10-14  8:46 ` [PATCH 11/13] drm/scheduler: remove drm_sched_dependency_optimized Christian König
2022-10-14  8:46 ` [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini Christian König
2022-11-17  2:36   ` Dmitry Osipenko
2022-11-17  9:53     ` Christian König
2022-11-17 12:47       ` Dmitry Osipenko
2022-11-17 12:55         ` Christian König
2022-11-17 12:59           ` Dmitry Osipenko
2022-11-17 13:00             ` Dmitry Osipenko
2022-11-17 13:11               ` Christian König
2022-11-17 14:41                 ` Dmitry Osipenko
2022-11-17 15:09                   ` Christian König
2022-11-17 15:11                     ` Dmitry Osipenko
2022-12-28 16:27                       ` Rob Clark
2022-12-28 16:52                         ` Rob Clark
2023-01-01 18:29                           ` youling257
2023-01-02  9:24                             ` Dmitry Osipenko
2023-01-02 14:17                               ` youling 257
2023-01-02 14:17                                 ` youling 257
2023-01-02 15:08                                 ` Dmitry Osipenko
2023-01-02 15:08                                   ` Dmitry Osipenko
2022-12-26 16:01   ` [12/13] " Jonathan Marek
2022-10-14  8:46 ` [PATCH 13/13] drm/scheduler: rename dependency callback into prepare_job Christian König
2022-10-23  1:35 ` Fixes for scheduler hang when killing a process Luben Tuikov
2022-10-24  7:00 ` Luben Tuikov
  -- strict thread matches above, loose matches on Subject: below --
2022-12-21 21:59 [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS Bert Karwatzki
2022-12-21 21:12 Bert Karwatzki
2022-09-29 13:21 [PATCH 01/13] drm/scheduler: fix fence ref counting Christian König
2022-09-29 13:21 ` [PATCH 10/13] drm/amdgpu: use scheduler depenencies for CS Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.