From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Tejun Heo <tj@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Dmitry Torokhov <dmitry.torokhov@gmail.com>,
Wu Zhangjin <falcon@meizu.com>, Takashi Iwai <tiwai@suse.de>,
Arjan van de Ven <arjan@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>,
hare@suse.com, Andrew Morton <akpm@linux-foundation.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Joseph Salisbury <joseph.salisbury@canonical.com>,
Benjamin Poirier <bpoirier@suse.de>,
Santosh Rastapur <santosh@chelsio.com>,
Kay Sievers <kay@vrfy.org>,
One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
Tim Gardner <tim.gardner@canonical.com>,
Pierre Fersing <pierre-fersing@pierref.org>,
Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>,
Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>,
Sreekanth Reddy <sreekanth.reddy
Subject: Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
Date: Fri, 05 Sep 2014 11:14:08 +0200 [thread overview]
Message-ID: <1409908448.5158.7.camel@marge.simpson.net> (raw)
In-Reply-To: <CAB=NE6Wti1RpAFk5q_YeZn2F9Rd=wsiwhyPszu74nG9fXwH5vQ@mail.gmail.com>
On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote:
> On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> > ...
> >> + /*
> >> + * I got SIGKILL, but wait for 60 more seconds for completion
> >> + * unless chosen by the OOM killer. This delay is there as a
> >> + * workaround for boot failure caused by SIGKILL upon device
> >> + * driver initialization timeout.
> >> + *
> >> + * N.B. this will actually let the thread complete regularly,
> >> + * wait_for_completion() will be used eventually, the 60 second
> >> + * try here is just to check for the OOM over that time.
> >> + */
> >> + WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> >> + "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> >> + for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> >> + if (wait_for_completion_timeout(&done, HZ))
> >> + goto wait_done;
> >> +
> >
> > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> > instead of 30?
>
> Nope! I fell into the same trap and only with tons of patience by part
> of Tetsuo with me was I able to grok that the 60 seconds here are not
> for increasing the timeout, this is just time spent checking to ensure
> that the OOM wasn't the one who triggered the SIGKILL. Even if the
> drivers took eons it should be fine now, I tried it :D
>
> > Why do we even need this with the proposed async
> > probing changes?
>
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
>
> mptsas was the first:
>
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
<quote>
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
30 seconds timeout. As a result, finit_module() of mptsas kernel
module receives SIGKILL when waiting for error handler thread to be
started.
</quote>
Hm. Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?
-Mike
next prev parent reply other threads:[~2014-09-05 9:14 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1409899047-13045-1-git-send-email-mcgrof@do-not-panic.com>
2014-09-05 6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
2014-09-05 11:24 ` Oleg Nesterov
2014-09-05 17:25 ` Luis R. Rodriguez
2014-09-05 22:10 ` Dmitry Torokhov
2014-10-20 23:43 ` Luis R. Rodriguez
2014-09-05 6:37 ` [RFC v2 3/6] kthread: warn on kill signal if not OOM Luis R. Rodriguez
2014-09-05 7:19 ` Tejun Heo
2014-09-05 7:47 ` Luis R. Rodriguez
2014-09-05 9:14 ` Mike Galbraith [this message]
2014-09-05 14:12 ` Tejun Heo
2014-09-05 16:44 ` Dmitry Torokhov
2014-09-05 17:49 ` Tejun Heo
2014-09-05 18:10 ` Dmitry Torokhov
2014-09-05 22:29 ` Tejun Heo
2014-09-05 22:31 ` Tejun Heo
2014-09-05 22:49 ` Dmitry Torokhov
2014-09-05 22:55 ` Tejun Heo
2014-09-05 23:22 ` Dmitry Torokhov
2014-09-05 23:32 ` Tejun Heo
2014-09-05 22:45 ` Arjan van de Ven
2014-09-05 22:52 ` Dmitry Torokhov
2014-09-05 22:57 ` Tejun Heo
2014-09-05 23:05 ` Arjan van de Ven
2014-09-05 23:18 ` Dmitry Torokhov
2014-09-05 18:12 ` Luis R. Rodriguez
2014-09-05 18:29 ` Dmitry Torokhov
2014-09-05 22:40 ` Tejun Heo
2014-09-09 1:04 ` Luis R. Rodriguez
2014-09-09 1:10 ` Tejun Heo
2014-09-09 1:13 ` Tejun Heo
2014-09-09 1:22 ` Tejun Heo
2014-09-09 1:26 ` Luis R. Rodriguez
2014-09-09 1:29 ` Tejun Heo
2014-09-09 1:38 ` Luis R. Rodriguez
2014-09-09 1:47 ` Tejun Heo
2014-09-09 2:28 ` Luis R. Rodriguez
2014-09-09 2:39 ` Tejun Heo
2014-09-09 2:57 ` Luis R. Rodriguez
2014-09-09 3:03 ` Tejun Heo
2014-09-09 3:19 ` Luis R. Rodriguez
2014-09-09 3:25 ` Tejun Heo
2014-09-09 23:03 ` Tejun Heo
2014-09-12 20:14 ` Luis R. Rodriguez
2014-09-22 16:36 ` Luis R. Rodriguez
2014-09-10 5:13 ` Tom Gundersen
2014-09-09 5:38 ` James Bottomley
2014-09-09 19:16 ` Luis R. Rodriguez
2014-09-09 19:35 ` James Bottomley
2014-09-09 20:45 ` Luis R. Rodriguez
2014-09-10 6:46 ` Tom Gundersen
2014-09-10 10:07 ` [systemd-devel] " Ceriel Jacobs
2014-09-10 13:31 ` James Bottomley
2014-09-10 21:10 ` Luis R. Rodriguez
2014-09-11 5:42 ` Alexander E. Patrakov
2014-09-11 21:43 ` Tom Gundersen
2014-09-11 22:26 ` [systemd-devel] " Luis R. Rodriguez
2014-09-12 5:48 ` Tom Gundersen
2014-09-12 20:09 ` Luis R. Rodriguez
2014-10-10 21:54 ` Anatol Pomozov
2014-10-10 22:45 ` Tom Gundersen
2014-10-15 19:41 ` Anatol Pomozov
2014-10-15 19:46 ` Alexander E. Patrakov
2014-09-09 21:42 ` Tejun Heo
2014-09-09 22:26 ` James Bottomley
2014-09-09 22:41 ` Tejun Heo
2014-09-09 22:46 ` James Bottomley
2014-09-09 22:52 ` Tejun Heo
2014-09-09 23:01 ` Dmitry Torokhov
2014-09-11 19:59 ` James Bottomley
2014-09-11 20:23 ` Dmitry Torokhov
2014-09-11 20:42 ` Luis R. Rodriguez
2014-09-11 20:53 ` Dmitry Torokhov
2014-09-11 21:08 ` Luis R. Rodriguez
2014-09-22 19:49 ` Pavel Machek
2014-09-22 20:23 ` Dmitry Torokhov
2014-09-30 21:06 ` Pavel Machek
2014-09-30 21:34 ` Dmitry Torokhov
2014-09-09 22:00 ` Jiri Kosina
2014-09-05 10:59 ` Oleg Nesterov
2014-09-05 17:35 ` Luis R. Rodriguez
2014-09-05 6:37 ` [RFC v2 4/6] cxgb4: use async probe Luis R. Rodriguez
2014-09-05 6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
2014-09-05 7:16 ` Tejun Heo
2014-09-05 7:23 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1409908448.5158.7.camel@marge.simpson.net \
--to=umgwanakikbuti@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=bpoirier@suse.de \
--cc=dmitry.torokhov@gmail.com \
--cc=falcon@meizu.com \
--cc=gnomes@lxorguk.ukuu.org.uk \
--cc=gregkh@linuxfoundation.org \
--cc=hare@suse.com \
--cc=joseph.salisbury@canonical.com \
--cc=kay@vrfy.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@do-not-panic.com \
--cc=nagalakshmi.nandigama@avagotech.com \
--cc=oleg@redhat.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=pierre-fersing@pierref.org \
--cc=praveen.krishnamoorthy@avagotech.com \
--cc=santosh@chelsio.com \
--cc=tim.gardner@canonical.com \
--cc=tiwai@suse.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).