All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	David.Panariti@amd.com, "Michel Dänzer" <michel@daenzer.net>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	oleg@redhat.com, amd-gfx@lists.freedesktop.org,
	Alexander.Deucher@amd.com, akpm@linux-foundation.org,
	Christian.Koenig@amd.com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
Date: Wed, 25 Apr 2018 09:14:44 +0200	[thread overview]
Message-ID: <20180425071444.GM25142@phenom.ffwll.local> (raw)
In-Reply-To: <27d7d15b-f7c3-2a0a-af85-eb243526ac88@amd.com>

On Tue, Apr 24, 2018 at 05:37:08PM -0400, Andrey Grodzovsky wrote:
> 
> 
> On 04/24/2018 05:21 PM, Eric W. Biederman wrote:
> > Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> writes:
> > 
> > > On 04/24/2018 03:44 PM, Daniel Vetter wrote:
> > > > On Tue, Apr 24, 2018 at 05:46:52PM +0200, Michel Dänzer wrote:
> > > > > Adding the dri-devel list, since this is driver independent code.
> > > > > 
> > > > > 
> > > > > On 2018-04-24 05:30 PM, Andrey Grodzovsky wrote:
> > > > > > Avoid calling wait_event_killable when you are possibly being called
> > > > > > from get_signal routine since in that case you end up in a deadlock
> > > > > > where you are alreay blocked in singla processing any trying to wait
> > > > > Multiple typos here, "[...] already blocked in signal processing and [...]"?
> > > > > 
> > > > > 
> > > > > > on a new signal.
> > > > > > 
> > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
> > > > > >    1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > index 088ff2b..09fd258 100644
> > > > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > > > @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
> > > > > >    		return;
> > > > > >    	/**
> > > > > >    	 * The client will not queue more IBs during this fini, consume existing
> > > > > > -	 * queued IBs or discard them on SIGKILL
> > > > > > +	 * queued IBs or discard them when in death signal state since
> > > > > > +	 * wait_event_killable can't receive signals in that state.
> > > > > >    	*/
> > > > > > -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
> > > > > > +	if (current->flags & PF_SIGNALED)
> > > > You want fatal_signal_pending() here, instead of inventing your own broken
> > > > version.
> > > I rely on current->flags & PF_SIGNALED because this being set from
> > > within get_signal,
> > It doesn't mean that.  Unless you are called by do_coredump (you
> > aren't).
> 
> Looking in latest code here
> https://elixir.bootlin.com/linux/v4.17-rc2/source/kernel/signal.c#L2449
> i see that current->flags |= PF_SIGNALED; is out side of
> if (sig_kernel_coredump(signr)) {...} scope

Ok I read some more about this, and I guess you go through process exit
and then eventually close. But I'm not sure.

The code in drm_sched_entity_fini also looks strange: You unpark the
scheduler thread before you remove all the IBs. At least from the comment
that doesn't sound like what you want to do.

But in general, PF_SIGNALED is really something deeply internal to the
core (used for some book-keeping and accounting). The drm scheduler is the
only thing looking at it, so smells like a layering violation. I suspect
(but without knowing what you're actually trying to achive here can't be
sure) you want to look at something else.

E.g. PF_EXITING seems to be used in a lot more places to cancel stuff
that's no longer relevant when a task exits, not PF_SIGNALED. There's the
TIF_MEMDIE flag if you're hacking around issues with the oom-killer.

This here on the other hand looks really fragile, and probably only does
what you want to do by accident.
-Daniel

> 
> Andrey
> 
> > The closing of files does not happen in do_coredump.
> > Which means you are being called from do_exit.
> > In fact you are being called after exit_files which closes
> > the files.  The actual __fput processing happens in task_work_run.
> > 
> > > meaning I am within signal processing  in which case I want to avoid
> > > any signal based wait for that task,
> > >  From what i see in the code, task_struct.pending.signal is being set
> > > for other threads in same
> > > group (zap_other_threads) or for other scenarios, those task are still
> > > able to receive signals
> > > so calling wait_event_killable there will not have problem.
> > Excpet that you are geing called after from do_exit and after exit_files
> > which is after exit_signal.  Which means that PF_EXITING has been set.
> > Which implies that the kernel signal handling machinery has already
> > started being torn down.
> > 
> > Not as much as I would like to happen at that point as we are still
> > left with some old CLONE_PTHREAD messes in the code that need to be
> > cleaned up.
> > 
> > Still given the fact you are task_work_run it is quite possible even
> > release_task has been run on that task before the f_op->release method
> > is called.  So you simply can not count on signals working.
> > 
> > Which in practice leaves a timeout for ending your wait.  That code can
> > legitimately be in a context that is neither interruptible nor killable.
> > 
> > > > > >    		entity->fini_status = -ERESTARTSYS;
> > > > > >    	else
> > > > > >    		entity->fini_status = wait_event_killable(sched->job_scheduled,
> > > > But really this smells like a bug in wait_event_killable, since
> > > > wait_event_interruptible does not suffer from the same bug. It will return
> > > > immediately when there's a signal pending.
> > > Even when wait_event_interruptible is called as following -
> > > ...->do_signal->get_signal->....->wait_event_interruptible ?
> > > I haven't tried it but wait_event_interruptible is very much alike to
> > > wait_event_killable so I would assume it will also
> > > not be interrupted if called like that. (Will give it a try just out
> > > of curiosity anyway)
> > As PF_EXITING is set want_signal should fail and the signal state of the
> > task should not be updatable by signals.
> > 
> > Eric
> > 
> > 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2018-04-25  7:14 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24 15:30 Avoid uninterruptible sleep during process exit Andrey Grodzovsky
2018-04-24 15:30 ` Andrey Grodzovsky
2018-04-24 15:30 ` [PATCH 1/3] signals: Allow generation of SIGKILL to exiting task Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 16:10   ` Eric W. Biederman
2018-04-24 16:10     ` Eric W. Biederman
2018-04-24 16:42   ` Eric W. Biederman
2018-04-24 16:42     ` Eric W. Biederman
2018-04-24 16:51     ` Andrey Grodzovsky
2018-04-24 16:51       ` Andrey Grodzovsky
2018-04-24 17:29       ` Eric W. Biederman
2018-04-25 13:13   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:46   ` Michel Dänzer
2018-04-24 15:51     ` Andrey Grodzovsky
2018-04-24 15:51       ` Andrey Grodzovsky
2018-04-24 15:52     ` Andrey Grodzovsky
2018-04-24 15:52       ` Andrey Grodzovsky
2018-04-24 19:44     ` Daniel Vetter
2018-04-24 19:44       ` Daniel Vetter
2018-04-24 21:00       ` Eric W. Biederman
2018-04-24 21:02       ` Andrey Grodzovsky
2018-04-24 21:02         ` Andrey Grodzovsky
2018-04-24 21:21         ` Eric W. Biederman
2018-04-24 21:37           ` Andrey Grodzovsky
2018-04-24 21:37             ` Andrey Grodzovsky
2018-04-24 22:11             ` Eric W. Biederman
2018-04-25  7:14             ` Daniel Vetter [this message]
2018-04-25 13:08               ` Andrey Grodzovsky
2018-04-25 13:08                 ` Andrey Grodzovsky
2018-04-25 15:29                 ` Eric W. Biederman
2018-04-25 16:13                   ` Andrey Grodzovsky
2018-04-25 16:31                     ` Eric W. Biederman
2018-04-24 21:40         ` Daniel Vetter
2018-04-24 21:40           ` Daniel Vetter
2018-04-25 13:22           ` Oleg Nesterov
2018-04-25 13:36             ` Daniel Vetter
2018-04-25 14:18               ` Oleg Nesterov
2018-04-25 14:18                 ` Oleg Nesterov
2018-04-25 13:43           ` Andrey Grodzovsky
2018-04-25 13:43             ` Andrey Grodzovsky
2018-04-24 16:23   ` Eric W. Biederman
2018-04-24 16:23     ` Eric W. Biederman
2018-04-24 16:43     ` Andrey Grodzovsky
2018-04-24 16:43       ` Andrey Grodzovsky
2018-04-24 17:12       ` Eric W. Biederman
2018-04-25 13:55         ` Oleg Nesterov
2018-04-25 14:21           ` Andrey Grodzovsky
2018-04-25 14:21             ` Andrey Grodzovsky
2018-04-25 17:17             ` Oleg Nesterov
2018-04-25 18:40               ` Andrey Grodzovsky
2018-04-25 18:40                 ` Andrey Grodzovsky
2018-04-26  0:01                 ` Eric W. Biederman
2018-04-26 12:34                   ` Andrey Grodzovsky
2018-04-26 12:34                     ` Andrey Grodzovsky
2018-04-26 12:52                     ` Andrey Grodzovsky
2018-04-26 12:52                       ` Andrey Grodzovsky
2018-04-26 15:57                       ` Eric W. Biederman
2018-04-26 20:43                         ` Andrey Grodzovsky
2018-04-26 20:43                           ` Andrey Grodzovsky
2018-04-30 12:08                   ` Christian König
2018-04-30 12:08                     ` Christian König
2018-04-30 14:32                     ` Andrey Grodzovsky
2018-04-30 14:32                       ` Andrey Grodzovsky
2018-04-30 15:25                       ` Christian König
2018-04-30 15:25                         ` Christian König
2018-04-30 16:00                       ` Oleg Nesterov
2018-04-30 16:10                         ` Andrey Grodzovsky
2018-04-30 16:10                           ` Andrey Grodzovsky
2018-04-30 18:29                           ` Christian König
2018-04-30 18:29                             ` Christian König
2018-04-30 19:28                             ` Andrey Grodzovsky
2018-04-30 19:28                               ` Andrey Grodzovsky
2018-05-02 11:48                               ` Christian König
2018-05-02 11:48                                 ` Christian König
2018-05-17 11:18                                 ` Andrey Grodzovsky
2018-05-17 14:48                                   ` Michel Dänzer
2018-05-17 15:33                                     ` Andrey Grodzovsky
2018-05-17 15:52                                       ` Michel Dänzer
2018-05-17 19:05                                     ` Andrey Grodzovsky
2018-05-18  8:46                                       ` Michel Dänzer
2018-05-18  9:42                                         ` Christian König
2018-05-18 14:44                                           ` Michel Dänzer
2018-05-18 14:50                                             ` Christian König
2018-05-18 15:02                                               ` Andrey Grodzovsky
2018-05-22 12:58                                                 ` Christian König
2018-05-22 15:49                                         ` Andrey Grodzovsky
2018-05-22 16:09                                           ` Michel Dänzer
2018-05-22 16:30                                             ` Andrey Grodzovsky
2018-05-22 16:33                                               ` Michel Dänzer
2018-05-22 16:37                                                 ` Andrey Grodzovsky
2018-05-01 14:35                           ` Oleg Nesterov
2018-05-23 15:08                             ` Andrey Grodzovsky
2018-05-23 15:08                               ` Andrey Grodzovsky
2018-04-30 15:29                     ` Oleg Nesterov
2018-04-30 16:25                     ` Eric W. Biederman
2018-04-30 17:18                       ` Andrey Grodzovsky
2018-04-30 17:18                         ` Andrey Grodzovsky
2018-04-25 13:05   ` Oleg Nesterov
2018-04-24 15:30 ` [PATCH 3/3] drm/amdgpu: Switch to interrupted wait to recover from ring hang Andrey Grodzovsky
2018-04-24 15:30   ` Andrey Grodzovsky
2018-04-24 15:52   ` Panariti, David
2018-04-24 15:52     ` Panariti, David
2018-04-24 15:58     ` Andrey Grodzovsky
2018-04-24 15:58       ` Andrey Grodzovsky
2018-04-24 16:20       ` Panariti, David
2018-04-24 16:20         ` Panariti, David
2018-04-24 16:30         ` Eric W. Biederman
2018-04-24 16:30           ` Eric W. Biederman
2018-04-25 17:17           ` Andrey Grodzovsky
2018-04-25 17:17             ` Andrey Grodzovsky
2018-04-25 20:55             ` Eric W. Biederman
2018-04-25 20:55               ` Eric W. Biederman
2018-04-26 12:28               ` Andrey Grodzovsky
2018-04-26 12:28                 ` Andrey Grodzovsky
2018-04-24 16:14   ` Eric W. Biederman
2018-04-24 16:14     ` Eric W. Biederman
2018-04-24 16:38     ` Andrey Grodzovsky
2018-04-24 16:38       ` Andrey Grodzovsky
2018-04-30 11:34   ` Christian König
2018-04-30 11:34     ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180425071444.GM25142@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David.Panariti@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.