All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
To: "Michel Dänzer" <michel@daenzer.net>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, David.Panariti@amd.com,
	oleg@redhat.com, ebiederm@xmission.com,
	Alexander.Deucher@amd.com, akpm@linux-foundation.org,
	Christian.Koenig@amd.com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Wed, 25 Apr 2018 09:43:48 -0400	[thread overview]
Message-ID: <615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com> (raw)
In-Reply-To: <20180424214027.GG25142@phenom.ffwll.local>



On 04/24/2018 05:40 PM, Daniel Vetter wrote:
> On Tue, Apr 24, 2018 at 05:02:40PM -0400, Andrey Grodzovsky wrote:
>>
>> On 04/24/2018 03:44 PM, Daniel Vetter wrote:
>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote:
>>>> Adding the dri-devel list, since this is driver independent code.
>>>>
>>>>
>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote:
>>>>> Avoid calling wait_event_killable when you are possibly being called
>>>>> from get_signal routine since in that case you end up in a deadlock
>>>>> where you are alreay blocked in singla processing any trying to wait
>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"?
>>>>
>>>>
>>>>> on a new signal.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> index 088ff2b..09fd258 100644
>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>>>>    		return;
>>>>>    	/**
>>>>>    	 * The client will not queue more IBs during this fini, consume existing
>>>>> -	 * queued IBs or discard them on SIGKILL
>>>>> +	 * queued IBs or discard them when in death signal state since
>>>>> +	 * wait_event_killable can't receive signals in that state.
>>>>>    	*/
>>>>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>>>>> +	if (current->flags & PF_SIGNALED)
>>> You want fatal_signal_pending() here, instead of inventing your own broken
>>> version.
>> I rely on current->flags & PF_SIGNALED because this being set from within
>> get_signal,
>> meaning I am within signal processing  in which case I want to avoid any
>> signal based wait for that task,
>>  From what i see in the code, task_struct.pending.signal is being set for
>> other threads in same
>> group (zap_other_threads) or for other scenarios, those task are still able
>> to receive signals
>> so calling wait_event_killable there will not have problem.
>>>>>    		entity->fini_status = -ERESTARTSYS;
>>>>>    	else
>>>>>    		entity->fini_status = wait_event_killable(sched->job_scheduled,
>>> But really this smells like a bug in wait_event_killable, since
>>> wait_event_interruptible does not suffer from the same bug. It will return
>>> immediately when there's a signal pending.
>> Even when wait_event_interruptible is called as following -
>> ...->do_signal->get_signal->....->wait_event_interruptible ?
>> I haven't tried it but wait_event_interruptible is very much alike to
>> wait_event_killable so I would assume it will also
>> not be interrupted if called like that. (Will give it a try just out of
>> curiosity anyway)
> wait_event_killabel doesn't check for fatal_signal_pending before calling
> schedule, so definitely has a nice race there.
>
> But if you're sure that you really need to check PF_SIGNALED, then I'm
> honestly not clear on what you're trying to pull off here. Your sparse
> explanation of what happens isn't enough, since I have no idea how you can
> get from get_signal() to the above wait_event_killable callsite.

Fatal signal will trigger process termination during which all FDs are 
released, including DRM's.

See here -

[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
[<0>] __fput+0x176/0x350
[<0>] task_work_run+0xa1/0xc0

(From Eric's explanation above is triggered by do_exit->exit_files)
...
[<0>] do_exit+0x48f/0x1280
[<0>] do_group_exit+0x89/0x140
[<0>] get_signal+0x375/0x8f0
[<0>] do_signal+0x79/0xaa0
[<0>] exit_to_usermode_loop+0x83/0xd0
[<0>] do_syscall_64+0x244/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Andrey

> -Daniel
>
>> Andrey
>>
>>> I think this should be fixed in core code, not papered over in some
>>> subsystem.
>>> -Daniel
>>>
>>>> -- 
>>>> Earthling Michel Dänzer               |               http://www.amd.com
>>>> Libre software enthusiast             |             Mesa and X developer
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
To: "Michel Dänzer" <michel@daenzer.net>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, David.Panariti@amd.com,
	oleg@redhat.com, ebiederm@xmission.com,
	Alexander.Deucher@amd.com, akpm@linux-foundation.org,
	Christian.Koenig@amd.com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Wed, 25 Apr 2018 09:43:48 -0400	[thread overview]
Message-ID: <615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com> (raw)
In-Reply-To: <20180424214027.GG25142@phenom.ffwll.local>



On 04/24/2018 05:40 PM, Daniel Vetter wrote:
> On Tue, Apr 24, 2018 at 05:02:40PM -0400, Andrey Grodzovsky wrote:
>>
>> On 04/24/2018 03:44 PM, Daniel Vetter wrote:
>>> On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote:
>>>> Adding the dri-devel list, since this is driver independent code.
>>>>
>>>>
>>>> On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote:
>>>>> Avoid calling wait_event_killable when you are possibly being called
>>>>> from get_signal routine since in that case you end up in a deadlock
>>>>> where you are alreay blocked in singla processing any trying to wait
>>>> Multiple typos here, "[...] already blocked in signal processing and [...]"?
>>>>
>>>>
>>>>> on a new signal.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> index 088ff2b..09fd258 100644
>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>>>>    		return;
>>>>>    	/**
>>>>>    	 * The client will not queue more IBs during this fini, consume existing
>>>>> -	 * queued IBs or discard them on SIGKILL
>>>>> +	 * queued IBs or discard them when in death signal state since
>>>>> +	 * wait_event_killable can't receive signals in that state.
>>>>>    	*/
>>>>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>>>>> +	if (current->flags & PF_SIGNALED)
>>> You want fatal_signal_pending() here, instead of inventing your own broken
>>> version.
>> I rely on current->flags & PF_SIGNALED because this being set from within
>> get_signal,
>> meaning I am within signal processing  in which case I want to avoid any
>> signal based wait for that task,
>>  From what i see in the code, task_struct.pending.signal is being set for
>> other threads in same
>> group (zap_other_threads) or for other scenarios, those task are still able
>> to receive signals
>> so calling wait_event_killable there will not have problem.
>>>>>    		entity->fini_status = -ERESTARTSYS;
>>>>>    	else
>>>>>    		entity->fini_status = wait_event_killable(sched->job_scheduled,
>>> But really this smells like a bug in wait_event_killable, since
>>> wait_event_interruptible does not suffer from the same bug. It will return
>>> immediately when there's a signal pending.
>> Even when wait_event_interruptible is called as following -
>> ...->do_signal->get_signal->....->wait_event_interruptible ?
>> I haven't tried it but wait_event_interruptible is very much alike to
>> wait_event_killable so I would assume it will also
>> not be interrupted if called like that. (Will give it a try just out of
>> curiosity anyway)
> wait_event_killabel doesn't check for fatal_signal_pending before calling
> schedule, so definitely has a nice race there.
>
> But if you're sure that you really need to check PF_SIGNALED, then I'm
> honestly not clear on what you're trying to pull off here. Your sparse
> explanation of what happens isn't enough, since I have no idea how you can
> get from get_signal() to the above wait_event_killable callsite.

Fatal signal will trigger process termination during which all FDs are 
released, including DRM's.

See here -

[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
[<0>] __fput+0x176/0x350
[<0>] task_work_run+0xa1/0xc0

(From Eric's explanation above is triggered by do_exit->exit_files)
...
[<0>] do_exit+0x48f/0x1280
[<0>] do_group_exit+0x89/0x140
[<0>] get_signal+0x375/0x8f0
[<0>] do_signal+0x79/0xaa0
[<0>] exit_to_usermode_loop+0x83/0xd0
[<0>] do_syscall_64+0x244/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Andrey

> -Daniel
>
>> Andrey
>>
>>> I think this should be fixed in core code, not papered over in some
>>> subsystem.
>>> -Daniel
>>>
>>>> -- 
>>>> Earthling Michel Dänzer               |               http://www.amd.com
>>>> Libre software enthusiast             |             Mesa and X developer
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2018-04-25 13:43 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
2018-04-24 15:30 ` Andrey Grodzovsky
2018-04-24 15:30 ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 16:10   ` Eric W. Biederman
2018-04-24 16:10     ` Eric W. Biederman
2018-04-24 16:42   ` Eric W. Biederman
2018-04-24 16:42     ` Eric W. Biederman
2018-04-24 16:51     ` Andrey Grodzovsky
2018-04-24 16:51       ` Andrey Grodzovsky
2018-04-24 17:29       ` Eric W. Biederman
2018-04-25 13:13   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:46   ` Michel Dänzer
2018-04-24 15:51     ` Andrey Grodzovsky
2018-04-24 15:51       ` Andrey Grodzovsky
2018-04-24 15:52     ` Andrey Grodzovsky
2018-04-24 15:52       ` Andrey Grodzovsky
2018-04-24 19:44     ` Daniel Vetter
2018-04-24 19:44       ` Daniel Vetter
2018-04-24 21:00       ` Eric W. Biederman
2018-04-24 21:02       ` Andrey Grodzovsky
2018-04-24 21:02         ` Andrey Grodzovsky
2018-04-24 21:21         ` Eric W. Biederman
2018-04-24 21:37           ` Andrey Grodzovsky
2018-04-24 21:37             ` Andrey Grodzovsky
2018-04-24 22:11             ` Eric W. Biederman
2018-04-25  7:14             ` Daniel Vetter
2018-04-25 13:08               ` Andrey Grodzovsky
2018-04-25 13:08                 ` Andrey Grodzovsky
2018-04-25 15:29                 ` Eric W. Biederman
2018-04-25 16:13                   ` Andrey Grodzovsky
2018-04-25 16:31                     ` Eric W. Biederman
2018-04-24 21:40         ` Daniel Vetter
2018-04-24 21:40           ` Daniel Vetter
2018-04-25 13:22           ` Oleg Nesterov
2018-04-25 13:36             ` Daniel Vetter
2018-04-25 14:18               ` Oleg Nesterov
2018-04-25 14:18                 ` Oleg Nesterov
2018-04-25 13:43           ` Andrey Grodzovsky [this message]
2018-04-25 13:43             ` Andrey Grodzovsky
2018-04-24 16:23   ` Eric W. Biederman
2018-04-24 16:23     ` Eric W. Biederman
2018-04-24 16:43     ` Andrey Grodzovsky
2018-04-24 16:43       ` Andrey Grodzovsky
2018-04-24 17:12       ` Eric W. Biederman
2018-04-25 13:55         ` Oleg Nesterov
2018-04-25 14:21           ` Andrey Grodzovsky
2018-04-25 14:21             ` Andrey Grodzovsky
2018-04-25 17:17             ` Oleg Nesterov
2018-04-25 18:40               ` Andrey Grodzovsky
2018-04-25 18:40                 ` Andrey Grodzovsky
2018-04-26  0:01                 ` Eric W. Biederman
2018-04-26 12:34                   ` Andrey Grodzovsky
2018-04-26 12:34                     ` Andrey Grodzovsky
2018-04-26 12:52                     ` Andrey Grodzovsky
2018-04-26 12:52                       ` Andrey Grodzovsky
2018-04-26 15:57                       ` Eric W. Biederman
2018-04-26 20:43                         ` Andrey Grodzovsky
2018-04-26 20:43                           ` Andrey Grodzovsky
2018-04-30 12:08                   ` Christian König
2018-04-30 12:08                     ` Christian König
2018-04-30 14:32                     ` Andrey Grodzovsky
2018-04-30 14:32                       ` Andrey Grodzovsky
2018-04-30 15:25                       ` Christian König
2018-04-30 15:25                         ` Christian König
2018-04-30 16:00                       ` Oleg Nesterov
2018-04-30 16:10                         ` Andrey Grodzovsky
2018-04-30 16:10                           ` Andrey Grodzovsky
2018-04-30 18:29                           ` Christian König
2018-04-30 18:29                             ` Christian König
2018-04-30 19:28                             ` Andrey Grodzovsky
2018-04-30 19:28                               ` Andrey Grodzovsky
2018-05-02 11:48                               ` Christian König
2018-05-02 11:48                                 ` Christian König
2018-05-17 11:18                                 ` Andrey Grodzovsky
2018-05-17 14:48                                   ` Michel Dänzer
2018-05-17 15:33                                     ` Andrey Grodzovsky
2018-05-17 15:52                                       ` Michel Dänzer
2018-05-17 19:05                                     ` Andrey Grodzovsky
2018-05-18  8:46                                       ` Michel Dänzer
2018-05-18  9:42                                         ` Christian König
2018-05-18 14:44                                           ` Michel Dänzer
2018-05-18 14:50                                             ` Christian König
2018-05-18 15:02                                               ` Andrey Grodzovsky
2018-05-22 12:58                                                 ` Christian König
2018-05-22 15:49                                         ` Andrey Grodzovsky
2018-05-22 16:09                                           ` Michel Dänzer
2018-05-22 16:30                                             ` Andrey Grodzovsky
2018-05-22 16:33                                               ` Michel Dänzer
2018-05-22 16:37                                                 ` Andrey Grodzovsky
2018-05-01 14:35                           ` Oleg Nesterov
2018-05-23 15:08                             ` Andrey Grodzovsky
2018-05-23 15:08                               ` Andrey Grodzovsky
2018-04-30 15:29                     ` Oleg Nesterov
2018-04-30 16:25                     ` Eric W. Biederman
2018-04-30 17:18                       ` Andrey Grodzovsky
2018-04-30 17:18                         ` Andrey Grodzovsky
2018-04-25 13:05   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:52   ` Panariti, David
2018-04-24 15:52     ` Panariti, David
2018-04-24 15:58     ` Andrey Grodzovsky
2018-04-24 15:58       ` Andrey Grodzovsky
2018-04-24 16:20       ` Panariti, David
2018-04-24 16:20         ` Panariti, David
2018-04-24 16:30         ` Eric W. Biederman
2018-04-24 16:30           ` Eric W. Biederman
2018-04-25 17:17           ` Andrey Grodzovsky
2018-04-25 17:17             ` Andrey Grodzovsky
2018-04-25 20:55             ` Eric W. Biederman
2018-04-25 20:55               ` Eric W. Biederman
2018-04-26 12:28               ` Andrey Grodzovsky
2018-04-26 12:28                 ` Andrey Grodzovsky
2018-04-24 16:14   ` Eric W. Biederman
2018-04-24 16:14     ` Eric W. Biederman
2018-04-24 16:38     ` Andrey Grodzovsky
2018-04-24 16:38       ` Andrey Grodzovsky
2018-04-30 11:34   ` Christian König
2018-04-30 11:34     ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=615cd01e-8c8e-910d-8f04-5576ab986ac0@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.