All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>,
	christian.koenig@amd.com, David.Panariti@amd.com,
	Oleg Nesterov <oleg@redhat.com>,
	amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Alexander.Deucher@amd.com, akpm@linux-foundation.org
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Mon, 30 Apr 2018 11:25:47 -0500	[thread overview]
Message-ID: <87k1so8xv8.fsf@xmission.com> (raw)
In-Reply-To: <c3c9787d-b279-8169-43d1-74eeb666ffbd@gmail.com> ("Christian \=\?utf-8\?Q\?K\=C3\=B6nig\=22's\?\= message of "Mon, 30 Apr 2018 14:08:44 +0200")

Christian König <ckoenig.leichtzumerken@gmail.com> writes:

> Hi Eric,
>
> sorry for the late response, was on vacation last week.
>
> Am 26.04.2018 um 02:01 schrieb Eric W. Biederman:
>> Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:
>>
>>> On 04/25/2018 01:17 PM, Oleg Nesterov wrote:
>>>> On 04/25, Andrey Grodzovsky wrote:
>>>>> here (drm_sched_entity_fini) is also a bad idea, but we still want to be
>>>>> able to exit immediately
>>>>> and not wait for GPU jobs completion when the reason for reaching this code
>>>>> is because of KILL
>>>>> signal to the user process who opened the device file.
>>>> Can you hook f_op->flush method?
>
> THANKS! That sounds like a really good idea to me and we haven't investigated
> into that direction yet.

For the backwards compatibility concerns you cite below the flush method
seems a much better place to introduce the wait.  You at least really
will be in a process context for that.  Still might be in exit but at
least you will be legitimately be in a process.

>>> But this one is called for each task releasing a reference to the the file, so
>>> not sure I see how this solves the problem.
>> The big question is why do you need to wait during the final closing a
>> file?
>
> As always it's because of historical reasons. Initially user space pushed
> commands directly to a hardware queue and when a processes finished we didn't
> need to wait for anything.
>
> Then the GPU scheduler was introduced which delayed pushing the jobs to the
> hardware queue to a later point in time.
>
> This wait was then added to maintain backward compability and not break
> userspace (but see below).

That make sense.

>> The wait can be terminated so the wait does not appear to be simply a
>> matter of correctness.
>
> Well when the process is killed we don't care about correctness any more, we
> just want to get rid of it as quickly as possible (OOM situation etc...).
>
> But it is perfectly possible that a process submits some render commands and
> then calls exit() or terminates because of a SIGTERM, SIGINT etc.. In this case
> we need to wait here to make sure that all rendering is pushed to the hardware
> because the scheduler might need resources/settings from the file
> descriptor.
>
> For example if you just remove that wait you could close firefox and get garbage
> on the screen for a millisecond because the remaining rendering commands where
> not executed.
>
> So what we essentially need is to distinct between a SIGKILL (which means stop
> processing as soon as possible) and any other reason because then we don't want
> to annoy the user with garbage on the screen (even if it's just for a few
> milliseconds).

I see a couple of issues.

- Running the code in release rather than in flush.

Using flush will catch every close so it should be more backwards
compatible.  f_op->flush always runs in process context so looking at
current makes sense.

- Distinguishing between death by SIGKILL and other process exit deaths.

In f_op->flush the code can test "((tsk->flags & PF_EXITING) &&
(tsk->code == SIGKILL))" to see if it was SIGKILL that terminated
the process.

- Dealing with stuck queues (where this patchset came in).

For stuck queues you are going to need a timeout instead of the current
indefinite wait after PF_EXITING is set.  From what you have described a
few milliseconds should be enough.  If PF_EXITING is not set you can
still just make the wait killable and skip the timeout if that will give
a better backwards compatible user experience.

What can't be done is try and catch SIGKILL after a process has called
do_exit.  A dead process is a dead process.

Eric

  parent reply	other threads:[~2018-04-30 16:25 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
2018-04-24 15:30 ` Andrey Grodzovsky
2018-04-24 15:30 ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 16:10   ` Eric W. Biederman
2018-04-24 16:10     ` Eric W. Biederman
2018-04-24 16:42   ` Eric W. Biederman
2018-04-24 16:42     ` Eric W. Biederman
2018-04-24 16:51     ` Andrey Grodzovsky
2018-04-24 16:51       ` Andrey Grodzovsky
2018-04-24 17:29       ` Eric W. Biederman
2018-04-25 13:13   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:46   ` Michel Dänzer
2018-04-24 15:51     ` Andrey Grodzovsky
2018-04-24 15:51       ` Andrey Grodzovsky
2018-04-24 15:52     ` Andrey Grodzovsky
2018-04-24 15:52       ` Andrey Grodzovsky
2018-04-24 19:44     ` Daniel Vetter
2018-04-24 19:44       ` Daniel Vetter
2018-04-24 21:00       ` Eric W. Biederman
2018-04-24 21:02       ` Andrey Grodzovsky
2018-04-24 21:02         ` Andrey Grodzovsky
2018-04-24 21:21         ` Eric W. Biederman
2018-04-24 21:37           ` Andrey Grodzovsky
2018-04-24 21:37             ` Andrey Grodzovsky
2018-04-24 22:11             ` Eric W. Biederman
2018-04-25  7:14             ` Daniel Vetter
2018-04-25 13:08               ` Andrey Grodzovsky
2018-04-25 13:08                 ` Andrey Grodzovsky
2018-04-25 15:29                 ` Eric W. Biederman
2018-04-25 16:13                   ` Andrey Grodzovsky
2018-04-25 16:31                     ` Eric W. Biederman
2018-04-24 21:40         ` Daniel Vetter
2018-04-24 21:40           ` Daniel Vetter
2018-04-25 13:22           ` Oleg Nesterov
2018-04-25 13:36             ` Daniel Vetter
2018-04-25 14:18               ` Oleg Nesterov
2018-04-25 14:18                 ` Oleg Nesterov
2018-04-25 13:43           ` Andrey Grodzovsky
2018-04-25 13:43             ` Andrey Grodzovsky
2018-04-24 16:23   ` Eric W. Biederman
2018-04-24 16:23     ` Eric W. Biederman
2018-04-24 16:43     ` Andrey Grodzovsky
2018-04-24 16:43       ` Andrey Grodzovsky
2018-04-24 17:12       ` Eric W. Biederman
2018-04-25 13:55         ` Oleg Nesterov
2018-04-25 14:21           ` Andrey Grodzovsky
2018-04-25 14:21             ` Andrey Grodzovsky
2018-04-25 17:17             ` Oleg Nesterov
2018-04-25 18:40               ` Andrey Grodzovsky
2018-04-25 18:40                 ` Andrey Grodzovsky
2018-04-26  0:01                 ` Eric W. Biederman
2018-04-26 12:34                   ` Andrey Grodzovsky
2018-04-26 12:34                     ` Andrey Grodzovsky
2018-04-26 12:52                     ` Andrey Grodzovsky
2018-04-26 12:52                       ` Andrey Grodzovsky
2018-04-26 15:57                       ` Eric W. Biederman
2018-04-26 20:43                         ` Andrey Grodzovsky
2018-04-26 20:43                           ` Andrey Grodzovsky
2018-04-30 12:08                   ` Christian König
2018-04-30 12:08                     ` Christian König
2018-04-30 14:32                     ` Andrey Grodzovsky
2018-04-30 14:32                       ` Andrey Grodzovsky
2018-04-30 15:25                       ` Christian König
2018-04-30 15:25                         ` Christian König
2018-04-30 16:00                       ` Oleg Nesterov
2018-04-30 16:10                         ` Andrey Grodzovsky
2018-04-30 16:10                           ` Andrey Grodzovsky
2018-04-30 18:29                           ` Christian König
2018-04-30 18:29                             ` Christian König
2018-04-30 19:28                             ` Andrey Grodzovsky
2018-04-30 19:28                               ` Andrey Grodzovsky
2018-05-02 11:48                               ` Christian König
2018-05-02 11:48                                 ` Christian König
2018-05-17 11:18                                 ` Andrey Grodzovsky
2018-05-17 14:48                                   ` Michel Dänzer
2018-05-17 15:33                                     ` Andrey Grodzovsky
2018-05-17 15:52                                       ` Michel Dänzer
2018-05-17 19:05                                     ` Andrey Grodzovsky
2018-05-18  8:46                                       ` Michel Dänzer
2018-05-18  9:42                                         ` Christian König
2018-05-18 14:44                                           ` Michel Dänzer
2018-05-18 14:50                                             ` Christian König
2018-05-18 15:02                                               ` Andrey Grodzovsky
2018-05-22 12:58                                                 ` Christian König
2018-05-22 15:49                                         ` Andrey Grodzovsky
2018-05-22 16:09                                           ` Michel Dänzer
2018-05-22 16:30                                             ` Andrey Grodzovsky
2018-05-22 16:33                                               ` Michel Dänzer
2018-05-22 16:37                                                 ` Andrey Grodzovsky
2018-05-01 14:35                           ` Oleg Nesterov
2018-05-23 15:08                             ` Andrey Grodzovsky
2018-05-23 15:08                               ` Andrey Grodzovsky
2018-04-30 15:29                     ` Oleg Nesterov
2018-04-30 16:25                     ` Eric W. Biederman [this message]
2018-04-30 17:18                       ` Andrey Grodzovsky
2018-04-30 17:18                         ` Andrey Grodzovsky
2018-04-25 13:05   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:52   ` Panariti, David
2018-04-24 15:52     ` Panariti, David
2018-04-24 15:58     ` Andrey Grodzovsky
2018-04-24 15:58       ` Andrey Grodzovsky
2018-04-24 16:20       ` Panariti, David
2018-04-24 16:20         ` Panariti, David
2018-04-24 16:30         ` Eric W. Biederman
2018-04-24 16:30           ` Eric W. Biederman
2018-04-25 17:17           ` Andrey Grodzovsky
2018-04-25 17:17             ` Andrey Grodzovsky
2018-04-25 20:55             ` Eric W. Biederman
2018-04-25 20:55               ` Eric W. Biederman
2018-04-26 12:28               ` Andrey Grodzovsky
2018-04-26 12:28                 ` Andrey Grodzovsky
2018-04-24 16:14   ` Eric W. Biederman
2018-04-24 16:14     ` Eric W. Biederman
2018-04-24 16:38     ` Andrey Grodzovsky
2018-04-24 16:38       ` Andrey Grodzovsky
2018-04-30 11:34   ` Christian König
2018-04-30 11:34     ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1so8xv8.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.