All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Byungchul Park <max.byungchul.park@gmail.com>,
	Byungchul Park <byungchul.park@lge.com>,
	rcu <rcu@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	kernel-team@lge.com
Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable
Date: Thu, 18 Jul 2019 14:34:19 -0700	[thread overview]
Message-ID: <20190718213419.GV14271@linux.ibm.com> (raw)
In-Reply-To: <CAEXW_YTcL-nOfJXkChGhvQtqqfSLpAYr327PLu1SmGEEADCevw@mail.gmail.com>

On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote:
> Trimming the list a bit to keep my noise level low,
> 
> On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> [snip]
> > > It still feels like you guys are hyperfocusing on this one particular
> > > > knob.  I instead need you to look at the interrelating knobs as a group.
> > >
> > > Thanks for the hints, we'll do that.
> > >
> > > > On the debugging side, suppose someone gives you an RCU bug report.
> > > > What information will you need?  How can you best get that information
> > > > without excessive numbers of over-and-back interactions with the guy
> > > > reporting the bug?  As part of this last question, what information is
> > > > normally supplied with the bug?  Alternatively, what information are
> > > > bug reporters normally expected to provide when asked?
> > >
> > > I suppose I could dig out some of our Android bug reports of the past where
> > > there were RCU issues but if there's any fires you are currently fighting do
> > > send it our way as debugging homework ;-)
> >
> >   Suppose that you were getting RCU CPU stall
> > warnings featuring multi_cpu_stop() called from cpu_stopper_thread().
> > Of course, this really means that some other CPU/task is holding up
> > multi_cpu_stop() without also blocking the current grace period.
> >
> 
> So I took a shot at this trying to learn how CPU stoppers work in
> relation to this problem.
> 
> I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state
> in multi_cpu_stop() but another CPU Y has not yet entered this state.
> So CPU X is stalling RCU but it is really because of CPU Y. Now in the
> problem statement, you mentioned CPU Y is not holding up the grace
> period, which means Y doesn't have any of IRQ, BH or preemption
> disabled ; but is still somehow stalling RCU indirectly by troubling
> X.
> 
> This can only happen if :
> - CPU Y has a thread executing on it that is higher priority than CPU
> X's stopper thread which prevents it from getting scheduled. - but the
> CPU stopper thread (migration/..) is highest priority RT so this would
> be some kind of an odd scheduler bug.
> - There is a bug in the CPU stopper machinery itself preventing it
> from scheduling the stopper on Y. Even though Y is not holding up the
> grace period.

- CPU Y might have already passed through its quiescent state for
  the current grace period, then disabled IRQs indefinitely.
  Now, CPU Y would block a later grace period, but CPU X is
  preventing the current grace period from ending, so no such
  later grace period can start.

> Did I get that right? Would be exciting to run the rcutorture test
> once Paul has it available to reproduce this problem.

Working on it!  Slow, I know!

							Thanx, Paul


  parent reply	other threads:[~2019-07-18 21:34 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08  6:00 [PATCH] rcu: Make jiffies_till_sched_qs writable Byungchul Park
2019-07-08 12:50 ` Paul E. McKenney
2019-07-08 13:03   ` Joel Fernandes
2019-07-08 13:19     ` Paul E. McKenney
2019-07-08 14:15       ` Joel Fernandes
2019-07-09  6:05       ` Byungchul Park
2019-07-09 12:43         ` Paul E. McKenney
2019-07-09  5:58     ` Byungchul Park
2019-07-09  6:45       ` Byungchul Park
2019-07-09 12:41       ` Paul E. McKenney
2019-07-10  1:20         ` Byungchul Park
2019-07-11 12:30           ` Paul E. McKenney
2019-07-11 13:08             ` Joel Fernandes
2019-07-11 15:02               ` Paul E. McKenney
2019-07-11 16:48                 ` Joel Fernandes
2019-07-11 19:58                   ` Joel Fernandes
2019-07-12  6:32                     ` Byungchul Park
2019-07-12 12:51                       ` Joel Fernandes
2019-07-12 13:02                         ` Paul E. McKenney
2019-07-12 13:43                           ` Joel Fernandes
2019-07-12 14:53                             ` Paul E. McKenney
2019-07-13  8:47                         ` Byungchul Park
2019-07-13 14:20                           ` Joel Fernandes
2019-07-13 15:13                             ` Paul E. McKenney
2019-07-13 15:42                               ` Joel Fernandes
2019-07-13 17:41                                 ` Paul E. McKenney
2019-07-14 13:39                                   ` Byungchul Park
2019-07-14 13:56                                     ` Paul E. McKenney
2019-07-15 17:39                                       ` Joel Fernandes
2019-07-15 20:09                                         ` Paul E. McKenney
2019-07-18 16:14                                   ` Joel Fernandes
2019-07-18 16:15                                     ` Joel Fernandes
2019-07-18 21:34                                     ` Paul E. McKenney [this message]
2019-07-19  0:48                                       ` Joel Fernandes
2019-07-19  0:54                                       ` Byungchul Park
2019-07-19  0:39                                     ` Byungchul Park
2019-07-19  0:52                                       ` Joel Fernandes
2019-07-19  1:10                                         ` Byungchul Park
2019-07-19  7:43                                         ` Paul E. McKenney
2019-07-19  9:57                                           ` Byungchul Park
2019-07-19 19:57                                             ` Paul E. McKenney
2019-07-19 20:33                                               ` Joel Fernandes
2019-07-23 11:05                                                 ` Byungchul Park
2019-07-23 13:47                                                   ` Paul E. McKenney
2019-07-23 16:54                                                     ` Paul E. McKenney
2019-07-24  7:58                                                       ` Byungchul Park
2019-07-24  7:59                                                     ` Byungchul Park
2019-07-12 13:01                     ` Paul E. McKenney
2019-07-12 13:40                       ` Joel Fernandes
2019-07-12  6:00                 ` Byungchul Park
2019-07-12  5:52               ` Byungchul Park
2019-07-12  5:48             ` Byungchul Park
2019-07-13  9:08               ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190718213419.GV14271@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=byungchul.park@lge.com \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.byungchul.park@gmail.com \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.