netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Tejun Heo <tj@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Wu Zhangjin <falcon@meizu.com>, Takashi Iwai <tiwai@suse.de>,
	Arjan van de Ven <arjan@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	hare@suse.com, Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Joseph Salisbury <joseph.salisbury@canonical.com>,
	Benjamin Poirier <bpoirier@suse.de>,
	Santosh Rastapur <santosh@chelsio.com>,
	Kay Sievers <kay@vrfy.org>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Tim Gardner <tim.gardner@canonical.com>,
	Pierre Fersing <pierre-fersing@pierref.org>,
	Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>,
	Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>,
	Sreekanth Reddy <sreekanth.reddy
Subject: Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
Date: Fri, 05 Sep 2014 11:14:08 +0200	[thread overview]
Message-ID: <1409908448.5158.7.camel@marge.simpson.net> (raw)
In-Reply-To: <CAB=NE6Wti1RpAFk5q_YeZn2F9Rd=wsiwhyPszu74nG9fXwH5vQ@mail.gmail.com>

On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: 
> On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> > ...
> >> +             /*
> >> +              * I got SIGKILL, but wait for 60 more seconds for completion
> >> +              * unless chosen by the OOM killer. This delay is there as a
> >> +              * workaround for boot failure caused by SIGKILL upon device
> >> +              * driver initialization timeout.
> >> +              *
> >> +              * N.B. this will actually let the thread complete regularly,
> >> +              * wait_for_completion() will be used eventually, the 60 second
> >> +              * try here is just to check for the OOM over that time.
> >> +              */
> >> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> >> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> >> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> >> +                     if (wait_for_completion_timeout(&done, HZ))
> >> +                             goto wait_done;
> >> +
> >
> > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> > instead of 30?
> 
> Nope! I fell into the same trap and only with tons of patience by part
> of Tetsuo with me was I able to grok that the 60 seconds here are not
> for increasing the timeout, this is just time spent checking to ensure
> that the OOM wasn't the one who triggered the SIGKILL. Even if the
> drivers took eons it should be fine now, I tried it :D
> 
> >  Why do we even need this with the proposed async
> > probing changes?
> 
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

<quote>
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
    30 seconds timeout. As a result, finit_module() of mptsas kernel
    module receives SIGKILL when waiting for error handler thread to be
    started.
</quote>

Hm.  Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?

-Mike

  reply	other threads:[~2014-09-05  9:14 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1409899047-13045-1-git-send-email-mcgrof@do-not-panic.com>
2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
2014-09-05 11:24   ` Oleg Nesterov
2014-09-05 17:25     ` Luis R. Rodriguez
2014-09-05 22:10   ` Dmitry Torokhov
2014-10-20 23:43     ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 3/6] kthread: warn on kill signal if not OOM Luis R. Rodriguez
2014-09-05  7:19   ` Tejun Heo
2014-09-05  7:47     ` Luis R. Rodriguez
2014-09-05  9:14       ` Mike Galbraith [this message]
2014-09-05 14:12       ` Tejun Heo
2014-09-05 16:44         ` Dmitry Torokhov
2014-09-05 17:49           ` Tejun Heo
2014-09-05 18:10             ` Dmitry Torokhov
2014-09-05 22:29               ` Tejun Heo
2014-09-05 22:31                 ` Tejun Heo
2014-09-05 22:49                   ` Dmitry Torokhov
2014-09-05 22:55                     ` Tejun Heo
2014-09-05 23:22                       ` Dmitry Torokhov
2014-09-05 23:32                         ` Tejun Heo
2014-09-05 22:45                 ` Arjan van de Ven
2014-09-05 22:52                   ` Dmitry Torokhov
2014-09-05 22:57                     ` Tejun Heo
2014-09-05 23:05                     ` Arjan van de Ven
2014-09-05 23:18                       ` Dmitry Torokhov
2014-09-05 18:12             ` Luis R. Rodriguez
2014-09-05 18:29               ` Dmitry Torokhov
2014-09-05 22:40               ` Tejun Heo
2014-09-09  1:04                 ` Luis R. Rodriguez
2014-09-09  1:10                   ` Tejun Heo
2014-09-09  1:13                     ` Tejun Heo
2014-09-09  1:22                     ` Tejun Heo
2014-09-09  1:26                       ` Luis R. Rodriguez
2014-09-09  1:29                         ` Tejun Heo
2014-09-09  1:38                           ` Luis R. Rodriguez
2014-09-09  1:47                             ` Tejun Heo
2014-09-09  2:28                               ` Luis R. Rodriguez
2014-09-09  2:39                                 ` Tejun Heo
2014-09-09  2:57                                   ` Luis R. Rodriguez
2014-09-09  3:03                                     ` Tejun Heo
2014-09-09  3:19                                       ` Luis R. Rodriguez
2014-09-09  3:25                                         ` Tejun Heo
2014-09-09 23:03                                           ` Tejun Heo
2014-09-12 20:14                                             ` Luis R. Rodriguez
2014-09-22 16:36                                     ` Luis R. Rodriguez
2014-09-10  5:13                         ` Tom Gundersen
2014-09-09  5:38                     ` James Bottomley
2014-09-09 19:16                       ` Luis R. Rodriguez
2014-09-09 19:35                         ` James Bottomley
2014-09-09 20:45                           ` Luis R. Rodriguez
2014-09-10  6:46                             ` Tom Gundersen
2014-09-10 10:07                               ` [systemd-devel] " Ceriel Jacobs
2014-09-10 13:31                                 ` James Bottomley
2014-09-10 21:10                               ` Luis R. Rodriguez
2014-09-11  5:42                                 ` Alexander E. Patrakov
2014-09-11 21:43                                 ` Tom Gundersen
2014-09-11 22:26                                   ` [systemd-devel] " Luis R. Rodriguez
2014-09-12  5:48                                     ` Tom Gundersen
2014-09-12 20:09                                       ` Luis R. Rodriguez
2014-10-10 21:54                                         ` Anatol Pomozov
2014-10-10 22:45                                           ` Tom Gundersen
2014-10-15 19:41                                             ` Anatol Pomozov
2014-10-15 19:46                                               ` Alexander E. Patrakov
2014-09-09 21:42                           ` Tejun Heo
2014-09-09 22:26                             ` James Bottomley
2014-09-09 22:41                               ` Tejun Heo
2014-09-09 22:46                                 ` James Bottomley
2014-09-09 22:52                                   ` Tejun Heo
2014-09-09 23:01                                   ` Dmitry Torokhov
2014-09-11 19:59                                     ` James Bottomley
2014-09-11 20:23                                       ` Dmitry Torokhov
2014-09-11 20:42                                         ` Luis R. Rodriguez
2014-09-11 20:53                                           ` Dmitry Torokhov
2014-09-11 21:08                                             ` Luis R. Rodriguez
2014-09-22 19:49                                         ` Pavel Machek
2014-09-22 20:23                                           ` Dmitry Torokhov
2014-09-30 21:06                                             ` Pavel Machek
2014-09-30 21:34                                               ` Dmitry Torokhov
2014-09-09 22:00                         ` Jiri Kosina
2014-09-05 10:59   ` Oleg Nesterov
2014-09-05 17:35     ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 4/6] cxgb4: use async probe Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
2014-09-05  7:16   ` Tejun Heo
2014-09-05  7:23   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1409908448.5158.7.camel@marge.simpson.net \
    --to=umgwanakikbuti@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bpoirier@suse.de \
    --cc=dmitry.torokhov@gmail.com \
    --cc=falcon@meizu.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.com \
    --cc=joseph.salisbury@canonical.com \
    --cc=kay@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@do-not-panic.com \
    --cc=nagalakshmi.nandigama@avagotech.com \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pierre-fersing@pierref.org \
    --cc=praveen.krishnamoorthy@avagotech.com \
    --cc=santosh@chelsio.com \
    --cc=tim.gardner@canonical.com \
    --cc=tiwai@suse.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).