All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: oleg@redhat.com, joseph.salisbury@canonical.com
Cc: JBottomley@parallels.com, Nagalakshmi.Nandigama@lsi.com,
	Sreekanth.Reddy@lsi.com, rientjes@google.com,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	tj@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org,
	kernel-team@lists.ubuntu.com, linux-scsi@vger.kernel.org
Subject: Re: [v3.13][v3.14][Regression] kthread:makekthread_create()killable
Date: Wed, 19 Mar 2014 20:49:25 +0900	[thread overview]
Message-ID: <201403192049.BBI39025.OVFMOOJtFSHFQL@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20140318171620.GA10636@redhat.com>

Oleg Nesterov wrote:
> > > If we need the urgent hack to fix the regression, then I suggest to change
> > > scsi_host_alloc() temporary until mptsas (or whatever) is fixed.
> >
> > Device initialization taking longer than 30 seconds is possible and is not a
> > hang up. It is systemd which needs to be fixed.
> 
> Perhaps systemd needs the fix too, I do not know. But this is irrelevant,
> I think. Or at least this should be discussed separately.

I confirmed that this problem goes away if systemd-udevd supports longer
timeout.

> 
> kthread_run() can fail anyway, mptsas_probe() should not crash the kernel.

Right. But mptsas_probe() triggering an OOPS is irrelevant to kthread_run()
( comment #27 ).

> 
> And btw, it is not clear to me if in this case device initialization really
> needs more than 30 seconds... My understanding is probably wrong, so please
> correct me. But it seems that before your "make kthread_create() killable"
> 
> 	- probe hangs
> 
> 	- SIGKILL wakes it up
> 
> 	- so I assume that the probe was interrupted and didn't finish
> 	  correctly ???
> 
> 	- initialization continues, does scsi_host_alloc(), etc, and
> 	  everything works fine even if probe was interrupted?
> 

I confirmed that device initialization really took more than 30 seconds
( comments #51 and #52 ).

> So perhaps that probe should not hang and this should be fixed too ?
> Do you know where exactly it hangs? And where it is woken up by SIGKILL ?
> Or I totally misunderstood ?

The probe did not hang. SIGKILL affected only wait_for_completion_killable()
in kthread_create_on_node() called by mptsas_probe() via scsi_host_alloc().
Thus, the probe was interrupted because kthread_run() returned an error.

> > I think we need a bit different version, in order to take TIF_MEMDIE flag into
> > account at the caller of kthread_create(), for the purpose of commit 786235ee
> > is "try to die as soon as possible if chosen by the OOM killer".
> >
> > 	for (;;) {
> > 		shost->ehandler = kthread_run(scsi_error_handler, shost,
> > 					      "scsi_eh_%d", shost->host_no);
> > 		if (PTR_ERR(shost->ehandler) != -EINTR ||
> > 		    test_thread_flag(TIF_MEMDIE))
> 
> Well, personally I do not care about TIF_MEMDIE/oom at all. We need the
> temporary hack (unless we have the "right" fix right now) which should be
> reverted later.

I do seriously care about TIF_MEMDIE/oom. Last week I respond to a trouble
which hit "kernel: request_module() OOM local DoS" (RHBZ #853474) without
any malice.

> Not sure I understand... Yes, wait_for_completion_killable() can return
> immediately if TIF_SIGPENDING will be set again for any reason. Say, another
> signal. But the next iteration will clear TIF_SIGPENDING ?
> 
> >      As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets
> >      TIF_SIGPENDING flag
> 
> Ah, I see, you mean that kmalloc() can do this every time. No, this should
> not happen or we have another problem.

Then, what happens if somebody does

  while (1)
      kill(pid, SIGKILL);

where pid is the process calling kthread_run() from the "for (;;)" loop in
scsi_host_alloc()? Theoretically, it will form an infinite retry loop.
Clearing TIF_SIGPENDING does not guarantee that next
wait_for_completion_killable() does not return immediately.
Doing retry decision at scsi_host_alloc() will make things worse than
doing it at kthread_create_on_node().

> Anyway. I agree with any hack in scsi_host_alloc/etc, this is up to
> maintainers. I still think that your change uncovered the problems in
> drivers/message/fusion/, these problems should be fixed somehow.
> 
> Dear maintainers, we need your help.
> 

Right. We found that we can fix this problem by updating systemd-udevd to
support longer timeout ( comment #53 ). Joseph, would you consult systemd
maintainers?

  reply	other threads:[~2014-03-19 11:50 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-14 20:46 [v3.13][v3.14][Regression] kthread: make kthread_create() killable Joseph Salisbury
2014-03-15  0:43 ` Tetsuo Handa
2014-03-16 15:13   ` Tetsuo Handa
2014-03-16 16:25     ` Oleg Nesterov
2014-03-17 12:38       ` [v3.13][v3.14][Regression] kthread: make kthread_create()killable Tetsuo Handa
2014-03-17 14:22         ` Oleg Nesterov
2014-03-18 12:03           ` [v3.13][v3.14][Regression] kthread: makekthread_create()killable Tetsuo Handa
2014-03-18 17:16             ` Oleg Nesterov
2014-03-19 11:49               ` Tetsuo Handa [this message]
2014-03-19 16:13                 ` [v3.13][v3.14][Regression] kthread:makekthread_create()killable Joseph Salisbury
2014-03-19 17:52                   ` Oleg Nesterov
2014-03-19 18:29                     ` please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable) Oleg Nesterov
2014-03-19 19:42                       ` Oleg Nesterov
2014-03-19 21:04                         ` Joseph Salisbury
2014-03-20 16:46                         ` Joseph Salisbury
2014-03-20 19:23                           ` Oleg Nesterov
2014-03-21 18:34                             ` Oleg Nesterov
2014-03-21 19:32                               ` Linus Torvalds
2014-03-21 20:31                                 ` Oleg Nesterov
2014-03-21 22:56                                 ` James Bottomley
2014-03-22  6:25                               ` please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable) Tetsuo Handa
2014-03-22 19:25                                 ` Oleg Nesterov
2014-03-22 20:48                                   ` James Bottomley
2014-03-24 17:01                                     ` Oleg Nesterov
2014-03-22 21:25                                   ` Thomas Gleixner
2014-03-22 22:01                                     ` Thomas Gleixner
2014-03-22 23:57                                       ` please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable) Tetsuo Handa
2014-03-23  8:04                                         ` Thomas Gleixner
2014-03-23 14:19                                           ` James Bottomley
2014-03-23 14:28                                             ` Thomas Gleixner
2014-03-23 14:29                                               ` James Bottomley
2014-03-22 23:50                                   ` Tetsuo Handa
2014-03-17 20:02 ` [v3.13][v3.14][Regression] kthread: make kthread_create() killable Andrew Morton
2014-03-17 20:19   ` Oleg Nesterov
2014-03-17 20:39     ` Andrew Morton
2014-03-18 17:45       ` Oleg Nesterov
2014-06-03 13:03     ` [PATCH] kthread: Fix return value of kthread_create() upon SIGKILL Tetsuo Handa
2014-06-03 21:35       ` David Rientjes
2014-03-17 21:32   ` [v3.13][v3.14][Regression] kthread: make kthread_create()killable Tetsuo Handa
2014-03-17 23:18   ` [v3.13][v3.14][Regression] kthread: make kthread_create() killable One Thousand Gnomes
2014-03-18 17:50     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201403192049.BBI39025.OVFMOOJtFSHFQL@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=JBottomley@parallels.com \
    --cc=Nagalakshmi.Nandigama@lsi.com \
    --cc=Sreekanth.Reddy@lsi.com \
    --cc=akpm@linux-foundation.org \
    --cc=joseph.salisbury@canonical.com \
    --cc=kernel-team@lists.ubuntu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.