All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom St Denis <tom.stdenis-5C7GfCeVMHo@public.gmane.org>
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini
Date: Mon, 2 Oct 2017 12:00:36 -0400	[thread overview]
Message-ID: <0a4dd01c-c598-5622-b85d-097449291f38@amd.com> (raw)
In-Reply-To: <3032bef3-4829-8cae-199a-11353b38c49a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hi Nicolai,

Ping?  :-)

Tom


On 28/09/17 11:01 AM, Christian König wrote:
> Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:
>> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>
>> Highly concurrent Piglit runs can trigger a race condition where a 
>> pending
>> SDMA job on a buffer object is never executed because the corresponding
>> process is killed (perhaps due to a crash). Since the job's fences were
>> never signaled, the buffer object was effectively leaked. Worse, the
>> buffer was stuck wherever it happened to be at the time, possibly in 
>> VRAM.
>>
>> The symptom was user space processes stuck in interruptible waits with
>> kernel stacks like:
>>
>>      [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
>>      [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
>>      [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
>>      [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
>>      [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
>>      [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
>>      [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
>>      [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
>>      [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 
>> [amdgpu]
>>      [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
>>      [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
>>      [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
>>      [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
>>      [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>>      [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
>>      [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
>>      [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
>>      [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>> Acked-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> index 54eb77cffd9b..32a99e980d78 100644
>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct 
>> amd_gpu_scheduler *sched,
>>                       amd_sched_entity_is_idle(entity));
>>       amd_sched_rq_remove_entity(rq, entity);
>>       if (r) {
>>           struct amd_sched_job *job;
>>           /* Park the kernel for a moment to make sure it isn't 
>> processing
>>            * our enity.
>>            */
>>           kthread_park(sched->thread);
>>           kthread_unpark(sched->thread);
>> -        while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
>> +        while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
>> +            struct amd_sched_fence *s_fence = job->s_fence;
>> +            amd_sched_fence_scheduled(s_fence);
> 
> It would be really nice to have an error code set on s_fence->finished 
> before it is signaled, use dma_fence_set_error() for this.
> 
> Additional to that it would be nice to note in the subject line that 
> this is a rather important bug fix.
> 
> With that fixed the whole series is Reviewed-by: Christian König 
> <christian.koenig@amd.com>.
> 
> Regards,
> Christian.
> 
>> +            amd_sched_fence_finished(s_fence);
>> +            dma_fence_put(&s_fence->finished);
>>               sched->ops->free_job(job);
>> +        }
>>       }
>>       kfifo_free(&entity->job_queue);
>>   }
>>   static void amd_sched_entity_wakeup(struct dma_fence *f, struct 
>> dma_fence_cb *cb)
>>   {
>>       struct amd_sched_entity *entity =
>>           container_of(cb, struct amd_sched_entity, cb);
>>       entity->dependency = NULL;
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2017-10-02 16:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-28 14:55 [PATCH 1/5] drm/amd/sched: rename amd_sched_entity_pop_job Nicolai Hähnle
     [not found] ` <20170928145530.12844-1-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 14:55   ` [PATCH 2/5] drm/amd/sched: fix an outdated comment Nicolai Hähnle
2017-09-28 14:55   ` [PATCH 3/5] drm/amd/sched: move adding finish callback to amd_sched_job_begin Nicolai Hähnle
2017-09-28 14:55   ` [PATCH 4/5] drm/amd/sched: NULL out the s_fence field after run_job Nicolai Hähnle
     [not found]     ` <20170928145530.12844-4-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 18:39       ` Andres Rodriguez
     [not found]         ` <7064b408-60db-2817-0ae7-af6b2c56580b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 19:04           ` Nicolai Hähnle
2017-09-28 14:55   ` [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini Nicolai Hähnle
     [not found]     ` <20170928145530.12844-5-nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-28 15:01       ` Christian König
     [not found]         ` <3032bef3-4829-8cae-199a-11353b38c49a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-02 16:00           ` Tom St Denis [this message]
2017-10-09  6:42           ` Liu, Monk
     [not found]             ` <BLUPR12MB044904A26E01C265C49042E484740-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09  8:02               ` Christian König
     [not found]                 ` <11f21e54-16b8-68e4-c63e-d791ef8bbffa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-09 10:14                   ` Nicolai Hähnle
     [not found]                     ` <d0f66c04-fbcd-09a2-6e4c-9de9ca7a93ff-5C7GfCeVMHo@public.gmane.org>
2017-10-09 10:35                       ` Liu, Monk
     [not found]                         ` <BLUPR12MB044925932C8D956F93CAF93E84740-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09 10:49                           ` Nicolai Hähnle
     [not found]                             ` <7e338e23-540c-4e2e-982f-f0eb623c75b1-5C7GfCeVMHo@public.gmane.org>
2017-10-09 10:59                               ` Christian König
     [not found]                                 ` <760c1434-0739-81ff-82c3-a5210c5575d3-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:04                                   ` Nicolai Hähnle
     [not found]                                     ` <de5e2c7c-b6cd-1c24-4d8e-7ae3cdfad0bd-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:12                                       ` Christian König
     [not found]                                         ` <9619ebd2-f218-7568-3b24-0a9d2b008a6a-5C7GfCeVMHo@public.gmane.org>
2017-10-09 11:27                                           ` Nicolai Hähnle
     [not found]                                             ` <de68c0ca-f36e-3adb-2c42-83a5176f07d8-5C7GfCeVMHo@public.gmane.org>
2017-10-09 12:33                                               ` Christian König
     [not found]                                                 ` <2f113fd3-ab4a-58b8-31d8-dc0a23751513-5C7GfCeVMHo@public.gmane.org>
2017-10-09 12:58                                                   ` Nicolai Hähnle
     [not found]                                                     ` <1a79e19c-a654-f5c7-84d9-ce4cce76243f-5C7GfCeVMHo@public.gmane.org>
2017-10-09 13:57                                                       ` Olsak, Marek
     [not found]                                                         ` <CY1PR12MB0885AF7148CD8ECE929E96D2F9740-1s8aH8ViOEfCYw/MNJAFQgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-10-09 14:01                                                           ` Nicolai Hähnle
2017-10-10  4:00                                                   ` Liu, Monk
2017-09-28 18:30       ` Marek Olšák
2017-09-29  2:17       ` Chunming Zhou
2017-10-11 16:30       ` Michel Dänzer
     [not found]         ` <7cb63e4c-9b65-b9b9-14dc-26368ca7126a-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12  8:05           ` Christian König
     [not found]             ` <c67d1bd8-81a0-4133-c3df-dd2a1b1a8c11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-12 11:00               ` Michel Dänzer
     [not found]                 ` <51ec8d88-32eb-ef4a-b34b-d2fd8e23281e-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 11:44                   ` Christian König
     [not found]                     ` <4c750ed5-98be-eafa-e684-940ecb2787f0-5C7GfCeVMHo@public.gmane.org>
2017-10-12 13:42                       ` Michel Dänzer
     [not found]                         ` <bc0e87da-a632-07ce-6934-86aee099b916-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 13:50                           ` Christian König
     [not found]                             ` <609e2516-d783-597c-d771-21dc89091043-5C7GfCeVMHo@public.gmane.org>
2017-10-12 14:04                               ` Michel Dänzer
2017-10-12 16:49                   ` Michel Dänzer
     [not found]                     ` <6b509b43-a6e9-175b-7d64-87e38c5ea4e2-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-12 17:11                       ` Christian König
     [not found]                         ` <fcb5f430-5912-0feb-a586-eaf710433d8d-5C7GfCeVMHo@public.gmane.org>
2017-10-13 14:34                           ` Michel Dänzer
     [not found]                             ` <8ab106b9-363b-4fb2-6f1a-727a5e0e7bc5-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-10-13 15:20                               ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a4dd01c-c598-5622-b85d-097449291f38@amd.com \
    --to=tom.stdenis-5c7gfcevmho@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.