All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Liu Chuansheng <chuansheng.liu@intel.com>,
	Valdis Kletnieks <valdis.kletnieks@vt.edu>,
	linux-kernel@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.
Date: Thu, 16 May 2019 17:19:12 +0900	[thread overview]
Message-ID: <ee7501c6-d996-1684-1652-f0f838ba69c3@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20190515105540.vyzh6n62rqi5imqv@pathway.suse.cz>

On 2019/05/15 19:55, Petr Mladek wrote:
>> +	if (!stamp) {
>> +		stamp = jiffies;
>> +		if (!stamp)
>> +			stamp++;
>> +		t->killed_time = stamp;
>> +		return;
>> +	}
> 
> I might be too dumb but the above code looks pretty tricky to me.
> It would deserve a comment. Or better, I would remove
> trick to handle overflow. If it happens, we would just
> lose one check period.

We can use

  static inline unsigned long jiffies_nonzero(void)
  {
      const unsigned long stamp = jiffies;

      return stamp ? stamp : -1;
  }

or even shortcut "jiffies | 1" because difference by one jiffie
is an measurement error for multiple HZ of timeout.

> 
> Alternative solution would be to set the timestamp in
> complete_signal(). Then we would know that the timestamp
> is always valid when a fatal signal is pending.

Yes. But I guess that since signal might be delivered just before
setting PF_FROZEN and the thread might be kept frozen for longer
than timeout, we will need to reset the timestamp just before
clearing PF_FROZEN.



>> +	if (time_is_after_jiffies(stamp + timeout * HZ))
>> +		return;
>> +	trace_sched_process_hang(t);
>> +	if (sysctl_hung_task_panic) {
>> +		console_verbose();
>> +		hung_task_call_panic = true;
> 
> IMHO, the delayed task exit is much less fatal than sleeping
> in an uninterruptible state.
> 
> Anyway, the check is much less reliable. In case of hung_task,
> it is enough when the task gets scheduled. In the new check,
> the task has to do some amount of work until the signal
> gets handled and do_exit() is called.
> 
> The panic should either get enabled separately or we should
> never panic in this case.

OK, we should not share existing sysctl settings.

But in the context of syzbot's testing where there are only 2 CPUs
in the target VM (which means that only small number of threads and
not so much memory) and threads get SIGKILL after 5 seconds from fork(),
being unable to reach do_exit() within 10 seconds is likely a sign of
something went wrong. For example, 6 out of 7 trials of a reproducer for
https://syzkaller.appspot.com/bug?id=835a0b9e75b14b55112661cbc61ca8b8f0edf767
resulted in "no output from test machine" rather than "task hung".
This patch is revealing that such killed threads are failing to reach
do_exit() because they are trapped at unkillable retry loop due to a
race bug.

Therefore, I would like to try this patch in linux-next.git for feasibility
testing whether this patch helps finding more bugs and reproducers for such
bugs, by bringing "unable to terminate threads" reports out of "no output from
test machine" reports. We can add sysctl settings before sending to linux.git.

Any questions?


  reply	other threads:[~2019-05-16  8:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13 11:02 [PATCH] kernel/hung_task.c: Monitor killed tasks Tetsuo Handa
2019-05-13 11:11 ` Dmitry Vyukov
2019-05-14 22:28 ` Paul E. McKenney
2019-05-15 10:55 ` Petr Mladek
2019-05-16  8:19   ` Tetsuo Handa [this message]
2019-05-16 11:57     ` Petr Mladek
2019-05-16 12:38       ` Tetsuo Handa
2019-05-22 12:38         ` Tetsuo Handa
2019-05-22 13:41           ` Stephen Rothwell
2019-05-22 14:58             ` Tetsuo Handa
2019-05-22 21:09               ` Tetsuo Handa
2019-05-22 21:39                 ` Stephen Rothwell
2019-05-22 21:43                   ` Andrew Morton
2019-05-22 23:46                     ` Tetsuo Handa
2019-05-27 14:12         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee7501c6-d996-1684-1652-f0f838ba69c3@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=chuansheng.liu@intel.com \
    --cc=dvyukov@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=valdis.kletnieks@vt.edu \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.