linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, joel@joelfernandes.org
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs
Date: Mon, 5 Aug 2019 10:05:31 +0200	[thread overview]
Message-ID: <20190805080531.GH2349@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190804184159.GC28441@linux.ibm.com>

On Sun, Aug 04, 2019 at 11:41:59AM -0700, Paul E. McKenney wrote:
> On Sun, Aug 04, 2019 at 04:48:35PM +0200, Peter Zijlstra wrote:
> > On Sun, Aug 04, 2019 at 04:43:17PM +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 02, 2019 at 08:15:01AM -0700, Paul E. McKenney wrote:
> > > > The multi_cpu_stop() function relies on the scheduler to gain control from
> > > > whatever is running on the various online CPUs, including any nohz_full
> > > > CPUs running long loops in kernel-mode code.  Lack of the scheduler-clock
> > > > interrupt on such CPUs can delay multi_cpu_stop() for several minutes
> > > > and can also result in RCU CPU stall warnings.  This commit therefore
> > > > causes multi_cpu_stop() to enable the scheduler-clock interrupt on all
> > > > online CPUs.
> > > 
> > > This sounds wrong; should we be fixing sched_can_stop_tick() instead to
> > > return false when the stop task is runnable?
> 
> Agreed.  However, it is proving surprisingly hard to come up with a
> code sequence that has the effect of rcu_nocb without nohz_full.
> And rcu_nocb works just fine.  With nohz_full also in place, I am
> decreasing the failure rate, but it still fails, perhaps a few times
> per hour of TREE04 rcutorture on an eight-CPU system.  (My 12-CPU
> system stubbornly refuses to fail.  Good thing I kept the eight-CPU
> system around, I guess.)
> 
> When I arrive at some sequence of actions that actually work reliably,
> then by all means let's put it somewhere in the NO_HZ_FULL machinery!

I'm confused; what are you arguing? The patch as proposed is just wrong,
it needs to go.

> > And even without that; I don't understand how we're not instantly
> > preempted the moment we enqueue the stop task.
> 
> There is no preemption because CONFIG_PREEMPT=n for the scenarios still

That doesn't make sense; even with CONFIG_PREEMPT=n we set
TIF_NEED_RESCHED. We'll just not react to it as promptly (only explicit
rescheduling points and return to userspace). Enabling the tick will not
make any difference what so ever.

Tick based preemption will not 'fix' the lack of wakeup preemption. If
the stop task wakeup didn't set TIF_NEED_RESCHED, the OTHER/CFS tick
will not either.

> having trouble.  Yes, there are cond_resched() calls, but they don't do
> anything unless the appropriate flags are set, which won't always happen
> without the tick, apparently.  Or without -something- that isn't always
> happening as it should.

Right; so clearly we're not understanding what's happening. That seems
like a requirement for actually doing a patch.

> > Any enqueue, should go through check_preempt_curr() which will be an
> > instant resched_curr() when we just woke the stop class.
> 
> I did try hitting all of the CPUs with resched_cpu().  Ten times on each
> CPU with a ten-jiffy wait between each.  This might have decreased the
> probability of excessively long CPU-stopper waits by a factor of two or
> three, but it did not eliminate the excessively long waits.
> 
> What else should I try?
> 
> For example, are there any diagnostics I could collect, say from within
> the CPU stopper when things are taking too long?  I see CPU-stopper
> delays in excess of five -minutes-, so this is anything but subtle.

Catch the whole thing in a function trace?

The chain that should instantly set TIF_NEED_RESCHED:

  stop_machine()
    stop_machine_cpuslocked()
      stop_cpus()
        __stop_cpus()
          queue_stop_cpus_work()
            cpu_stop_queue_work()
	      wake_up_q()
	        wake_up_process()


  wake_up_process()
    try_to_wake_up()
      ttwu_queue()
        ttwu_queue_remote()
	  <- scheduler_ipi()
	    sched_ttwu_pending()
	      ttwu_do_activate()

        ttwu_do_activate()
	  activate_task()
	  ttwu_do_wakeup()
	    check_preempt_curr()
	      resched_curr()

You could frob some tracing into __stop_cpus(), before
wait_for_completion(), at that point all the CPUs in @cpumask should
either be running the stop task or have TIF_NEED_RESCHED set.



  parent reply	other threads:[~2019-08-05  8:06 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50   ` Peter Zijlstra
2019-08-04 14:52     ` Peter Zijlstra
2019-08-04 18:45       ` Paul E. McKenney
2019-08-04 18:42     ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07  0:03   ` Joel Fernandes
2019-08-07  0:16     ` Joel Fernandes
2019-08-07  0:35     ` Paul E. McKenney
2019-08-07  0:40       ` Steven Rostedt
2019-08-07  1:17         ` Paul E. McKenney
2019-08-07  1:24           ` Steven Rostedt
2019-08-07  3:47             ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43   ` Peter Zijlstra
2019-08-04 14:48     ` Peter Zijlstra
2019-08-04 18:41       ` Paul E. McKenney
2019-08-04 20:24         ` Paul E. McKenney
2019-08-05  4:19           ` Paul E. McKenney
2019-08-05  8:07             ` Peter Zijlstra
2019-08-05 14:47               ` Paul E. McKenney
2019-08-05  8:05         ` Peter Zijlstra [this message]
2019-08-05 14:54           ` Paul E. McKenney
2019-08-05 15:50             ` Peter Zijlstra
2019-08-05 17:48               ` Paul E. McKenney
2019-08-06 18:08                 ` Paul E. McKenney
2019-08-07 21:41                   ` Paul E. McKenney
2019-08-08 20:35                     ` Paul E. McKenney
2019-08-08 21:30                       ` Paul E. McKenney
2019-08-09 16:51                         ` Paul E. McKenney
2019-08-09 18:07                           ` Joel Fernandes
2019-08-09 18:39                             ` Paul E. McKenney
2019-08-12 21:02   ` Frederic Weisbecker
2019-08-12 23:23     ` Paul E. McKenney
2019-08-13  1:33       ` Joel Fernandes
2019-08-13 12:30       ` Frederic Weisbecker
2019-08-13 14:48         ` Paul E. McKenney
2019-08-14 17:55           ` Joel Fernandes
2019-08-14 22:05             ` Paul E. McKenney
2019-08-15 15:07               ` Joel Fernandes
2019-08-15 17:23                 ` Paul E. McKenney
2019-08-15 18:15                   ` Joel Fernandes
2019-08-15 18:39                     ` Paul E. McKenney
2019-08-15 19:42                       ` Joel Fernandes
2019-08-13 21:06       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805080531.GH2349@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).