All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicolai Hähnle" <nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: "Nicolai Hähnle" <nicolai.haehnle-5C7GfCeVMHo@public.gmane.org>
Subject: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini
Date: Thu, 28 Sep 2017 16:55:30 +0200	[thread overview]
Message-ID: <20170928145530.12844-5-nhaehnle@gmail.com> (raw)
In-Reply-To: <20170928145530.12844-1-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: Nicolai Hähnle <nicolai.haehnle@amd.com>

Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

    [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
    [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
    [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
    [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
    [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
    [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
    [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
    [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
    [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
    [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
    [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
    [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
    [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
    [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
    [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
    [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
    [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 54eb77cffd9b..32a99e980d78 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct amd_gpu_scheduler *sched,
 					amd_sched_entity_is_idle(entity));
 	amd_sched_rq_remove_entity(rq, entity);
 	if (r) {
 		struct amd_sched_job *job;
 
 		/* Park the kernel for a moment to make sure it isn't processing
 		 * our enity.
 		 */
 		kthread_park(sched->thread);
 		kthread_unpark(sched->thread);
-		while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
+		while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
+			struct amd_sched_fence *s_fence = job->s_fence;
+			amd_sched_fence_scheduled(s_fence);
+			amd_sched_fence_finished(s_fence);
+			dma_fence_put(&s_fence->finished);
 			sched->ops->free_job(job);
+		}
 
 	}
 	kfifo_free(&entity->job_queue);
 }
 
 static void amd_sched_entity_wakeup(struct dma_fence *f, struct dma_fence_cb *cb)
 {
 	struct amd_sched_entity *entity =
 		container_of(cb, struct amd_sched_entity, cb);
 	entity->dependency = NULL;
-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2017-09-28 14:55 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-28 14:55 [PATCH 1/5] drm/amd/sched: rename amd_sched_entity_pop_job Nicolai Hähnle
     [not found] ` <20170928145530.12844-1-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 14:55   ` [PATCH 2/5] drm/amd/sched: fix an outdated comment Nicolai Hähnle
2017-09-28 14:55   ` [PATCH 3/5] drm/amd/sched: move adding finish callback to amd_sched_job_begin Nicolai Hähnle
2017-09-28 14:55   ` [PATCH 4/5] drm/amd/sched: NULL out the s_fence field after run_job Nicolai Hähnle
     [not found]     ` <20170928145530.12844-4-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 18:39       ` Andres Rodriguez
     [not found]         ` <7064b408-60db-2817-0ae7-af6b2c56580b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 19:04           ` Nicolai Hähnle
2017-09-28 14:55   ` Nicolai Hähnle [this message]
     [not found]     ` <20170928145530.12844-5-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 15:01       ` [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini Christian König
     [not found]         ` <3032bef3-4829-8cae-199a-11353b38c49a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-02 16:00           ` Tom St Denis
2017-10-09  6:42           ` Liu, Monk
     [not found]             ` <BLUPR12MB044904A26E01C265C49042E484740-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09  8:02               ` Christian König
     [not found]                 ` <11f21e54-16b8-68e4-c63e-d791ef8bbffa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-09 10:14                   ` Nicolai Hähnle
     [not found]                     ` <d0f66c04-fbcd-09a2-6e4c-9de9ca7a93ff-5C7GfCeVMHo@public.gmane.org>
2017-10-09 10:35                       ` Liu, Monk
     [not found]                         ` <BLUPR12MB044925932C8D956F93CAF93E84740-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09 10:49                           ` Nicolai Hähnle
     [not found]                             ` <7e338e23-540c-4e2e-982f-f0eb623c75b1-5C7GfCeVMHo@public.gmane.org>
2017-10-09 10:59                               ` Christian König
     [not found]                                 ` <760c1434-0739-81ff-82c3-a5210c5575d3-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:04                                   ` Nicolai Hähnle
     [not found]                                     ` <de5e2c7c-b6cd-1c24-4d8e-7ae3cdfad0bd-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:12                                       ` Christian König
     [not found]                                         ` <9619ebd2-f218-7568-3b24-0a9d2b008a6a-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:27                                           ` Nicolai Hähnle
     [not found]                                             ` <de68c0ca-f36e-3adb-2c42-83a5176f07d8-5C7GfCeVMHo@public.gmane.org>
2017-10-09 12:33                                               ` Christian König
     [not found]                                                 ` <2f113fd3-ab4a-58b8-31d8-dc0a23751513-5C7GfCeVMHo@public.gmane.org>
2017-10-09 12:58                                                   ` Nicolai Hähnle
     [not found]                                                     ` <1a79e19c-a654-f5c7-84d9-ce4cce76243f-5C7GfCeVMHo@public.gmane.org>
2017-10-09 13:57                                                       ` Olsak, Marek
     [not found]                                                         ` <CY1PR12MB0885AF7148CD8ECE929E96D2F9740-1s8aH8ViOEfCYw/MNJAFQgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09 14:01                                                           ` Nicolai Hähnle
2017-10-10  4:00                                                   ` Liu, Monk
2017-09-28 18:30       ` Marek Olšák
2017-09-29  2:17       ` Chunming Zhou
2017-10-11 16:30       ` Michel Dänzer
     [not found]         ` <7cb63e4c-9b65-b9b9-14dc-26368ca7126a-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12  8:05           ` Christian König
     [not found]             ` <c67d1bd8-81a0-4133-c3df-dd2a1b1a8c11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-12 11:00               ` Michel Dänzer
     [not found]                 ` <51ec8d88-32eb-ef4a-b34b-d2fd8e23281e-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 11:44                   ` Christian König
     [not found]                     ` <4c750ed5-98be-eafa-e684-940ecb2787f0-5C7GfCeVMHo@public.gmane.org>
2017-10-12 13:42                       ` Michel Dänzer
     [not found]                         ` <bc0e87da-a632-07ce-6934-86aee099b916-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 13:50                           ` Christian König
     [not found]                             ` <609e2516-d783-597c-d771-21dc89091043-5C7GfCeVMHo@public.gmane.org>
2017-10-12 14:04                               ` Michel Dänzer
2017-10-12 16:49                   ` Michel Dänzer
     [not found]                     ` <6b509b43-a6e9-175b-7d64-87e38c5ea4e2-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 17:11                       ` Christian König
     [not found]                         ` <fcb5f430-5912-0feb-a586-eaf710433d8d-5C7GfCeVMHo@public.gmane.org>
2017-10-13 14:34                           ` Michel Dänzer
     [not found]                             ` <8ab106b9-363b-4fb2-6f1a-727a5e0e7bc5-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-13 15:20                               ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170928145530.12844-5-nhaehnle@gmail.com \
    --to=nhaehnle-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=nicolai.haehnle-5C7GfCeVMHo@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.