All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Liu Chuansheng <chuansheng.liu@intel.com>,
	Valdis Kletnieks <valdis.kletnieks@vt.edu>,
	linux-kernel@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.
Date: Thu, 16 May 2019 13:57:58 +0200	[thread overview]
Message-ID: <20190516115758.6v7oitg3vbkfhh5j@pathway.suse.cz> (raw)
In-Reply-To: <ee7501c6-d996-1684-1652-f0f838ba69c3@i-love.sakura.ne.jp>

CCed Stephen to discuss linux-next related question at the bottom
of the mail.

On Thu 2019-05-16 17:19:12, Tetsuo Handa wrote:
> On 2019/05/15 19:55, Petr Mladek wrote:
> >> +	if (!stamp) {
> >> +		stamp = jiffies;
> >> +		if (!stamp)
> >> +			stamp++;
> >> +		t->killed_time = stamp;
> >> +		return;
> >> +	}
> > 
> > I might be too dumb but the above code looks pretty tricky to me.
> > It would deserve a comment. Or better, I would remove
> > trick to handle overflow. If it happens, we would just
> > lose one check period.
> 
> We can use
> 
>   static inline unsigned long jiffies_nonzero(void)
>   {
>       const unsigned long stamp = jiffies;
> 
>       return stamp ? stamp : -1;
>   }
> 
> or even shortcut "jiffies | 1" because difference by one jiffie
> is an measurement error for multiple HZ of timeout.

I would just ignore the overflow. We would just start measuring
the timeout in the next check_hung_task() call. It is not
a big deal and removes few lines of a tricky code.

> >> +	if (time_is_after_jiffies(stamp + timeout * HZ))
> >> +		return;
> >> +	trace_sched_process_hang(t);
> >> +	if (sysctl_hung_task_panic) {
> >> +		console_verbose();
> >> +		hung_task_call_panic = true;
> > 
> > IMHO, the delayed task exit is much less fatal than sleeping
> > in an uninterruptible state.
> > 
> > Anyway, the check is much less reliable. In case of hung_task,
> > it is enough when the task gets scheduled. In the new check,
> > the task has to do some amount of work until the signal
> > gets handled and do_exit() is called.
> > 
> > The panic should either get enabled separately or we should
> > never panic in this case.
> 
> OK, we should not share existing sysctl settings.
> 
> But in the context of syzbot's testing where there are only 2 CPUs
> in the target VM (which means that only small number of threads and
> not so much memory) and threads get SIGKILL after 5 seconds from fork(),
> being unable to reach do_exit() within 10 seconds is likely a sign of
> something went wrong. For example, 6 out of 7 trials of a reproducer for
> https://syzkaller.appspot.com/bug?id=835a0b9e75b14b55112661cbc61ca8b8f0edf767
> resulted in "no output from test machine" rather than "task hung".
> This patch is revealing that such killed threads are failing to reach
> do_exit() because they are trapped at unkillable retry loop due to a
> race bug.
> 
> Therefore, I would like to try this patch in linux-next.git for feasibility
> testing whether this patch helps finding more bugs and reproducers for such
> bugs, by bringing "unable to terminate threads" reports out of "no output from
> test machine" reports. We can add sysctl settings before sending to linux.git.

In this case, the watchdog should get enabled on with
CONFIG_DEBUG_AID_FOR_SYZBOT

Also we should ask/inform Stephen about this. I am not sure
if he is willing to resolve eventual conflicts for these
syzboot-specific patches that are not upstream candidates.

A solution might be to create sysbot-specific for-next branch
that Stephen might simply ignore when there are conflicts.
And you would be responsible for updating it.

Best Regards,
Petr

  reply	other threads:[~2019-05-16 11:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13 11:02 [PATCH] kernel/hung_task.c: Monitor killed tasks Tetsuo Handa
2019-05-13 11:11 ` Dmitry Vyukov
2019-05-14 22:28 ` Paul E. McKenney
2019-05-15 10:55 ` Petr Mladek
2019-05-16  8:19   ` Tetsuo Handa
2019-05-16 11:57     ` Petr Mladek [this message]
2019-05-16 12:38       ` Tetsuo Handa
2019-05-22 12:38         ` Tetsuo Handa
2019-05-22 13:41           ` Stephen Rothwell
2019-05-22 14:58             ` Tetsuo Handa
2019-05-22 21:09               ` Tetsuo Handa
2019-05-22 21:39                 ` Stephen Rothwell
2019-05-22 21:43                   ` Andrew Morton
2019-05-22 23:46                     ` Tetsuo Handa
2019-05-27 14:12         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516115758.6v7oitg3vbkfhh5j@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=chuansheng.liu@intel.com \
    --cc=dvyukov@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=peterz@infradead.org \
    --cc=sfr@canb.auug.org.au \
    --cc=valdis.kletnieks@vt.edu \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.