From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> To: "Michel Dänzer" <michel@daenzer.net>, linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, David.Panariti@amd.com, oleg@redhat.com, ebiederm@xmission.com, Alexander.Deucher@amd.com, akpm@linux-foundation.org, Christian.Koenig@amd.com Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. Date: Wed, 25 Apr 2018 09:43:48 -0400 [thread overview] Message-ID: <615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com> (raw) In-Reply-To: <20180424214027.GG25142@phenom.ffwll.local> On 04/24/2018 05:40 PM, Daniel Vetter wrote: > On Tue, Apr 24, 2018 at 05:02:40PM -0400, Andrey Grodzovsky wrote: >> >> On 04/24/2018 03:44 PM, Daniel Vetter wrote: >>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote: >>>> Adding the dri-devel list, since this is driver independent code. >>>> >>>> >>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote: >>>>> Avoid calling wait_event_killable when you are possibly being called >>>>> from get_signal routine since in that case you end up in a deadlock >>>>> where you are alreay blocked in singla processing any trying to wait >>>> Multiple typos here, "[...] already blocked in signal processing and [...]"? >>>> >>>> >>>>> on a new signal. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>> --- >>>>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- >>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> index 088ff2b..09fd258 100644 >>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, >>>>> return; >>>>> /** >>>>> * The client will not queue more IBs during this fini, consume existing >>>>> - * queued IBs or discard them on SIGKILL >>>>> + * queued IBs or discard them when in death signal state since >>>>> + * wait_event_killable can't receive signals in that state. >>>>> */ >>>>> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) >>>>> + if (current->flags & PF_SIGNALED) >>> You want fatal_signal_pending() here, instead of inventing your own broken >>> version. >> I rely on current->flags & PF_SIGNALED because this being set from within >> get_signal, >> meaning I am within signal processing in which case I want to avoid any >> signal based wait for that task, >> From what i see in the code, task_struct.pending.signal is being set for >> other threads in same >> group (zap_other_threads) or for other scenarios, those task are still able >> to receive signals >> so calling wait_event_killable there will not have problem. >>>>> entity->fini_status = -ERESTARTSYS; >>>>> else >>>>> entity->fini_status = wait_event_killable(sched->job_scheduled, >>> But really this smells like a bug in wait_event_killable, since >>> wait_event_interruptible does not suffer from the same bug. It will return >>> immediately when there's a signal pending. >> Even when wait_event_interruptible is called as following - >> ...->do_signal->get_signal->....->wait_event_interruptible ? >> I haven't tried it but wait_event_interruptible is very much alike to >> wait_event_killable so I would assume it will also >> not be interrupted if called like that. (Will give it a try just out of >> curiosity anyway) > wait_event_killabel doesn't check for fatal_signal_pending before calling > schedule, so definitely has a nice race there. > > But if you're sure that you really need to check PF_SIGNALED, then I'm > honestly not clear on what you're trying to pull off here. Your sparse > explanation of what happens isn't enough, since I have no idea how you can > get from get_signal() to the above wait_event_killable callsite. Fatal signal will trigger process termination during which all FDs are released, including DRM's. See here - [<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched] [<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu] [<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu] [<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu] [<0>] drm_release+0x414/0x5b0 [drm] [<0>] __fput+0x176/0x350 [<0>] task_work_run+0xa1/0xc0 (From Eric's explanation above is triggered by do_exit->exit_files) ... [<0>] do_exit+0x48f/0x1280 [<0>] do_group_exit+0x89/0x140 [<0>] get_signal+0x375/0x8f0 [<0>] do_signal+0x79/0xaa0 [<0>] exit_to_usermode_loop+0x83/0xd0 [<0>] do_syscall_64+0x244/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Andrey > -Daniel > >> Andrey >> >>> I think this should be fixed in core code, not papered over in some >>> subsystem. >>> -Daniel >>> >>>> -- >>>> Earthling Michel Dänzer | http://www.amd.com >>>> Libre software enthusiast | Mesa and X developer >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> To: "Michel Dänzer" <michel@daenzer.net>, linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, David.Panariti@amd.com, oleg@redhat.com, ebiederm@xmission.com, Alexander.Deucher@amd.com, akpm@linux-foundation.org, Christian.Koenig@amd.com Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process. Date: Wed, 25 Apr 2018 09:43:48 -0400 [thread overview] Message-ID: <615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com> (raw) In-Reply-To: <20180424214027.GG25142@phenom.ffwll.local> On 04/24/2018 05:40 PM, Daniel Vetter wrote: > On Tue, Apr 24, 2018 at 05:02:40PM -0400, Andrey Grodzovsky wrote: >> >> On 04/24/2018 03:44 PM, Daniel Vetter wrote: >>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote: >>>> Adding the dri-devel list, since this is driver independent code. >>>> >>>> >>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote: >>>>> Avoid calling wait_event_killable when you are possibly being called >>>>> from get_signal routine since in that case you end up in a deadlock >>>>> where you are alreay blocked in singla processing any trying to wait >>>> Multiple typos here, "[...] already blocked in signal processing and [...]"? >>>> >>>> >>>>> on a new signal. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>> --- >>>>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++-- >>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> index 088ff2b..09fd258 100644 >>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched, >>>>> return; >>>>> /** >>>>> * The client will not queue more IBs during this fini, consume existing >>>>> - * queued IBs or discard them on SIGKILL >>>>> + * queued IBs or discard them when in death signal state since >>>>> + * wait_event_killable can't receive signals in that state. >>>>> */ >>>>> - if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL) >>>>> + if (current->flags & PF_SIGNALED) >>> You want fatal_signal_pending() here, instead of inventing your own broken >>> version. >> I rely on current->flags & PF_SIGNALED because this being set from within >> get_signal, >> meaning I am within signal processing in which case I want to avoid any >> signal based wait for that task, >> From what i see in the code, task_struct.pending.signal is being set for >> other threads in same >> group (zap_other_threads) or for other scenarios, those task are still able >> to receive signals >> so calling wait_event_killable there will not have problem. >>>>> entity->fini_status = -ERESTARTSYS; >>>>> else >>>>> entity->fini_status = wait_event_killable(sched->job_scheduled, >>> But really this smells like a bug in wait_event_killable, since >>> wait_event_interruptible does not suffer from the same bug. It will return >>> immediately when there's a signal pending. >> Even when wait_event_interruptible is called as following - >> ...->do_signal->get_signal->....->wait_event_interruptible ? >> I haven't tried it but wait_event_interruptible is very much alike to >> wait_event_killable so I would assume it will also >> not be interrupted if called like that. (Will give it a try just out of >> curiosity anyway) > wait_event_killabel doesn't check for fatal_signal_pending before calling > schedule, so definitely has a nice race there. > > But if you're sure that you really need to check PF_SIGNALED, then I'm > honestly not clear on what you're trying to pull off here. Your sparse > explanation of what happens isn't enough, since I have no idea how you can > get from get_signal() to the above wait_event_killable callsite. Fatal signal will trigger process termination during which all FDs are released, including DRM's. See here - [<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched] [<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu] [<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu] [<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu] [<0>] drm_release+0x414/0x5b0 [drm] [<0>] __fput+0x176/0x350 [<0>] task_work_run+0xa1/0xc0 (From Eric's explanation above is triggered by do_exit->exit_files) ... [<0>] do_exit+0x48f/0x1280 [<0>] do_group_exit+0x89/0x140 [<0>] get_signal+0x375/0x8f0 [<0>] do_signal+0x79/0xaa0 [<0>] exit_to_usermode_loop+0x83/0xd0 [<0>] do_syscall_64+0x244/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Andrey > -Daniel > >> Andrey >> >>> I think this should be fixed in core code, not papered over in some >>> subsystem. >>> -Daniel >>> >>>> -- >>>> Earthling Michel Dänzer | http://www.amd.com >>>> Libre software enthusiast | Mesa and X developer >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2018-04-25 13:43 UTC|newest] Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky 2018-04-24 15:30 ` Andrey Grodzovsky 2018-04-24 15:30 ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky 2018-04-24 15:30 ` Andrey Grodzovsky 2018-04-24 16:10 ` Eric W. Biederman 2018-04-24 16:10 ` Eric W. Biederman 2018-04-24 16:42 ` Eric W. Biederman 2018-04-24 16:42 ` Eric W. Biederman 2018-04-24 16:51 ` Andrey Grodzovsky 2018-04-24 16:51 ` Andrey Grodzovsky 2018-04-24 17:29 ` Eric W. Biederman 2018-04-25 13:13 ` Oleg Nesterov 2018-04-24 15:30 ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky 2018-04-24 15:30 ` Andrey Grodzovsky 2018-04-24 15:46 ` Michel Dänzer 2018-04-24 15:51 ` Andrey Grodzovsky 2018-04-24 15:51 ` Andrey Grodzovsky 2018-04-24 15:52 ` Andrey Grodzovsky 2018-04-24 15:52 ` Andrey Grodzovsky 2018-04-24 19:44 ` Daniel Vetter 2018-04-24 19:44 ` Daniel Vetter 2018-04-24 21:00 ` Eric W. Biederman 2018-04-24 21:02 ` Andrey Grodzovsky 2018-04-24 21:02 ` Andrey Grodzovsky 2018-04-24 21:21 ` Eric W. Biederman 2018-04-24 21:37 ` Andrey Grodzovsky 2018-04-24 21:37 ` Andrey Grodzovsky 2018-04-24 22:11 ` Eric W. Biederman 2018-04-25 7:14 ` Daniel Vetter 2018-04-25 13:08 ` Andrey Grodzovsky 2018-04-25 13:08 ` Andrey Grodzovsky 2018-04-25 15:29 ` Eric W. Biederman 2018-04-25 16:13 ` Andrey Grodzovsky 2018-04-25 16:31 ` Eric W. Biederman 2018-04-24 21:40 ` Daniel Vetter 2018-04-24 21:40 ` Daniel Vetter 2018-04-25 13:22 ` Oleg Nesterov 2018-04-25 13:36 ` Daniel Vetter 2018-04-25 14:18 ` Oleg Nesterov 2018-04-25 14:18 ` Oleg Nesterov 2018-04-25 13:43 ` Andrey Grodzovsky [this message] 2018-04-25 13:43 ` Andrey Grodzovsky 2018-04-24 16:23 ` Eric W. Biederman 2018-04-24 16:23 ` Eric W. Biederman 2018-04-24 16:43 ` Andrey Grodzovsky 2018-04-24 16:43 ` Andrey Grodzovsky 2018-04-24 17:12 ` Eric W. Biederman 2018-04-25 13:55 ` Oleg Nesterov 2018-04-25 14:21 ` Andrey Grodzovsky 2018-04-25 14:21 ` Andrey Grodzovsky 2018-04-25 17:17 ` Oleg Nesterov 2018-04-25 18:40 ` Andrey Grodzovsky 2018-04-25 18:40 ` Andrey Grodzovsky 2018-04-26 0:01 ` Eric W. Biederman 2018-04-26 12:34 ` Andrey Grodzovsky 2018-04-26 12:34 ` Andrey Grodzovsky 2018-04-26 12:52 ` Andrey Grodzovsky 2018-04-26 12:52 ` Andrey Grodzovsky 2018-04-26 15:57 ` Eric W. Biederman 2018-04-26 20:43 ` Andrey Grodzovsky 2018-04-26 20:43 ` Andrey Grodzovsky 2018-04-30 12:08 ` Christian König 2018-04-30 12:08 ` Christian König 2018-04-30 14:32 ` Andrey Grodzovsky 2018-04-30 14:32 ` Andrey Grodzovsky 2018-04-30 15:25 ` Christian König 2018-04-30 15:25 ` Christian König 2018-04-30 16:00 ` Oleg Nesterov 2018-04-30 16:10 ` Andrey Grodzovsky 2018-04-30 16:10 ` Andrey Grodzovsky 2018-04-30 18:29 ` Christian König 2018-04-30 18:29 ` Christian König 2018-04-30 19:28 ` Andrey Grodzovsky 2018-04-30 19:28 ` Andrey Grodzovsky 2018-05-02 11:48 ` Christian König 2018-05-02 11:48 ` Christian König 2018-05-17 11:18 ` Andrey Grodzovsky 2018-05-17 14:48 ` Michel Dänzer 2018-05-17 15:33 ` Andrey Grodzovsky 2018-05-17 15:52 ` Michel Dänzer 2018-05-17 19:05 ` Andrey Grodzovsky 2018-05-18 8:46 ` Michel Dänzer 2018-05-18 9:42 ` Christian König 2018-05-18 14:44 ` Michel Dänzer 2018-05-18 14:50 ` Christian König 2018-05-18 15:02 ` Andrey Grodzovsky 2018-05-22 12:58 ` Christian König 2018-05-22 15:49 ` Andrey Grodzovsky 2018-05-22 16:09 ` Michel Dänzer 2018-05-22 16:30 ` Andrey Grodzovsky 2018-05-22 16:33 ` Michel Dänzer 2018-05-22 16:37 ` Andrey Grodzovsky 2018-05-01 14:35 ` Oleg Nesterov 2018-05-23 15:08 ` Andrey Grodzovsky 2018-05-23 15:08 ` Andrey Grodzovsky 2018-04-30 15:29 ` Oleg Nesterov 2018-04-30 16:25 ` Eric W. Biederman 2018-04-30 17:18 ` Andrey Grodzovsky 2018-04-30 17:18 ` Andrey Grodzovsky 2018-04-25 13:05 ` Oleg Nesterov 2018-04-24 15:30 ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky 2018-04-24 15:30 ` Andrey Grodzovsky 2018-04-24 15:52 ` Panariti, David 2018-04-24 15:52 ` Panariti, David 2018-04-24 15:58 ` Andrey Grodzovsky 2018-04-24 15:58 ` Andrey Grodzovsky 2018-04-24 16:20 ` Panariti, David 2018-04-24 16:20 ` Panariti, David 2018-04-24 16:30 ` Eric W. Biederman 2018-04-24 16:30 ` Eric W. Biederman 2018-04-25 17:17 ` Andrey Grodzovsky 2018-04-25 17:17 ` Andrey Grodzovsky 2018-04-25 20:55 ` Eric W. Biederman 2018-04-25 20:55 ` Eric W. Biederman 2018-04-26 12:28 ` Andrey Grodzovsky 2018-04-26 12:28 ` Andrey Grodzovsky 2018-04-24 16:14 ` Eric W. Biederman 2018-04-24 16:14 ` Eric W. Biederman 2018-04-24 16:38 ` Andrey Grodzovsky 2018-04-24 16:38 ` Andrey Grodzovsky 2018-04-30 11:34 ` Christian König 2018-04-30 11:34 ` Christian König
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com \ --to=andrey.grodzovsky@amd.com \ --cc=Alexander.Deucher@amd.com \ --cc=Christian.Koenig@amd.com \ --cc=David.Panariti@amd.com \ --cc=akpm@linux-foundation.org \ --cc=amd-gfx@lists.freedesktop.org \ --cc=dri-devel@lists.freedesktop.org \ --cc=ebiederm@xmission.com \ --cc=linux-kernel@vger.kernel.org \ --cc=michel@daenzer.net \ --cc=oleg@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.