netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
	syzbot <syzbot+4bfbbf28a2e50ab07368@syzkaller.appspotmail.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	David Miller <davem@davemloft.net>,
	eladr@mellanox.com, Ido Schimmel <idosch@mellanox.com>,
	Jiri Pirko <jiri@mellanox.com>,
	John Stultz <john.stultz@linaro.org>,
	linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: INFO: rcu detected stall in ext4_write_checks
Date: Sat, 6 Jul 2019 00:28:01 -0400	[thread overview]
Message-ID: <20190706042801.GD11665@mit.edu> (raw)
In-Reply-To: <20190705191055.GT26519@linux.ibm.com>

On Fri, Jul 05, 2019 at 12:10:55PM -0700, Paul E. McKenney wrote:
> 
> Exactly, so although my patch might help for CONFIG_PREEMPT=n, it won't
> help in your scenario.  But looking at the dmesg from your URL above,
> I see the following:

I just tested with CONFIG_PREEMPT=n

% grep CONFIG_PREEMPT /build/ext4-64/.config
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPTIRQ_EVENTS is not set

And with your patch, it's still not helping.

I think that's because SCHED_DEADLINE is a real-time style scheduler:

       In  order  to fulfill the guarantees that are made when a thread is ad‐
       mitted to the SCHED_DEADLINE policy,  SCHED_DEADLINE  threads  are  the
       highest  priority  (user  controllable)  threads  in the system; if any
       SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
       under one of the other policies.

So a SCHED_DEADLINE process is not going yield control of the CPU,
even if it calls cond_resched() until the thread has run for more than
the sched_runtime parameter --- which for the syzkaller repro, was set
at 26 days.

There are some safety checks when using SCHED_DEADLINE:

       The kernel requires that:

           sched_runtime <= sched_deadline <= sched_period

       In  addition,  under  the  current implementation, all of the parameter
       values must be at least 1024 (i.e., just over one microsecond, which is
       the  resolution  of the implementation), and less than 2^63.  If any of
       these checks fails, sched_setattr(2) fails with the error EINVAL.

       The  CBS  guarantees  non-interference  between  tasks,  by  throttling
       threads that attempt to over-run their specified Runtime.

       To ensure deadline scheduling guarantees, the kernel must prevent situ‐
       ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐
       lable)  within  the given constraints.  The kernel thus performs an ad‐
       mittance test when setting or changing SCHED_DEADLINE  policy  and  at‐
       tributes.   This admission test calculates whether the change is feasi‐
       ble; if it is not, sched_setattr(2) fails with the error EBUSY.

The problem is that SCHED_DEADLINE is designed for sporadic tasks:

       A  sporadic  task is one that has a sequence of jobs, where each job is
       activated at most once per period.  Each job also has a relative  dead‐
       line,  before which it should finish execution, and a computation time,
       which is the CPU time necessary for executing the job.  The moment when
       a  task wakes up because a new job has to be executed is called the ar‐
       rival time (also referred to as the request time or release time).  The
       start time is the time at which a task starts its execution.  The abso‐
       lute deadline is thus obtained by adding the relative deadline  to  the
       arrival time.

It appears that kernel's admission control before allowing
SCHED_DEADLINE to be set on a thread was designed for sane
applications, and not abusive ones.  Given that process started doing
abusive things *after* SCHED_DEADLINE policy was set, in order kernel
to figure out that in fact SCHED_DEADLINE should be denied for any
arbitrary kernel thread would require either (a) solving the halting
problem, or (b) being able to anticipate the future (in which case,
we should be using that kernel algorithm to play the stock market  :-)

    	    	    	      	       - Ted

  reply	other threads:[~2019-07-06  4:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 17:27 INFO: rcu detected stall in ext4_write_checks syzbot
2019-06-26 18:42 ` Theodore Ts'o
2019-06-26 21:03   ` Theodore Ts'o
2019-06-26 22:47     ` Theodore Ts'o
2019-07-05 13:24       ` Dmitry Vyukov
2019-07-05 15:16         ` Paul E. McKenney
2019-07-05 15:47           ` Amir Goldstein
2019-07-05 15:48           ` Dmitry Vyukov
2019-07-05 19:10             ` Paul E. McKenney
2019-07-06  4:28               ` Theodore Ts'o [this message]
2019-07-06  6:16                 ` Paul E. McKenney
2019-07-06 15:02                   ` Theodore Ts'o
2019-07-06 18:03                     ` Paul E. McKenney
2019-07-07  1:16                       ` Paul E. McKenney
2019-07-14 14:48                         ` Dmitry Vyukov
2019-07-14 18:49                           ` Paul E. McKenney
2019-07-15 13:29                             ` Peter Zijlstra
2019-07-15 13:33                               ` Dmitry Vyukov
2019-07-15 13:46                                 ` Peter Zijlstra
2019-07-15 14:02                                   ` Paul E. McKenney
2019-07-22 10:03                                   ` Dmitry Vyukov
2019-07-23  8:51                                     ` Dmitry Vyukov
2019-07-14 19:05                           ` Theodore Ts'o
2019-07-14 19:29                             ` Paul E. McKenney
2019-07-15  3:10                               ` Paul E. McKenney
2019-07-15 13:01                                 ` Paul E. McKenney
2019-07-15 13:29                                   ` Dmitry Vyukov
2019-07-15 13:39                                   ` Peter Zijlstra
2019-07-15 14:03                                     ` Paul E. McKenney
2019-07-15 13:22                         ` Peter Zijlstra
2019-07-05 13:18   ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190706042801.GD11665@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=davem@davemloft.net \
    --cc=dvyukov@google.com \
    --cc=eladr@mellanox.com \
    --cc=idosch@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+4bfbbf28a2e50ab07368@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).