All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	tglx@linutronix.de, Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH v3] rcu: Allow to eliminate softirq processing from rcutree
Date: Fri, 22 Mar 2019 11:50:49 -0400	[thread overview]
Message-ID: <20190322155049.GA86662@google.com> (raw)
In-Reply-To: <20190322145823.GM4102@linux.ibm.com>

On Fri, Mar 22, 2019 at 07:58:23AM -0700, Paul E. McKenney wrote:
[snip]
> > > >  #ifdef CONFIG_RCU_NOCB_CPU
> > > >  static cpumask_var_t rcu_nocb_mask; /* CPUs to have callbacks offloaded. */
> > > > @@ -94,6 +72,8 @@ static void __init rcu_bootup_announce_oddness(void)
> > > >  		pr_info("\tRCU debug GP init slowdown %d jiffies.\n", gp_init_delay);
> > > >  	if (gp_cleanup_delay)
> > > >  		pr_info("\tRCU debug GP init slowdown %d jiffies.\n", gp_cleanup_delay);
> > > > +	if (!use_softirq)
> > > > +		pr_info("\tRCU_SOFTIRQ processing moved to rcuc kthreads.\n");
> > > >  	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG))
> > > >  		pr_info("\tRCU debug extended QS entry/exit.\n");
> > > >  	rcupdate_announce_bootup_oddness();
> > > > @@ -629,7 +609,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > >  		if (irqs_were_disabled) {
> > > >  			/* Enabling irqs does not reschedule, so... */
> > > > -			raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > +			if (!use_softirq)
> > > > +				raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > +			else
> > > > +				invoke_rcu_core();
> > > 
> > > This can result in deadlock.  This happens when the scheduler invokes
> > > rcu_read_unlock() with one of the rq or pi locks held, which means that
> > > interrupts are disabled.  And it also means that the wakeup done in
> > > invoke_rcu_core() could go after the same rq or pi lock.
> > > 
> > > What we really need here is some way to make soemthing happen on this
> > > CPU just after interrupts are re-enabled.  Here are the options I see:
> > > 
> > > 1.	Do set_tsk_need_resched() and set_preempt_need_resched(),
> > > 	just like in the "else" clause below.  This sort of works, but
> > > 	relies on some later interrupt or similar to get things started.
> > > 	This is just fine for normal grace periods, but not so much for
> > > 	expedited grace periods.
> > > 
> > > 2.	IPI some other CPU and have it IPI us back.  Not such a good plan
> > > 	when running an SMP kernel on a single CPU.
> > > 
> > > 3.	Have a "stub" RCU_SOFTIRQ that contains only the following:
> > > 
> > > 	/* Report any deferred quiescent states if preemption enabled. */
> > > 	if (!(preempt_count() & PREEMPT_MASK)) {
> > > 		rcu_preempt_deferred_qs(current);
> > > 	} else if (rcu_preempt_need_deferred_qs(current)) {
> > > 		set_tsk_need_resched(current);
> > > 		set_preempt_need_resched();
> > > 	}
> > > 
> > > 4.	Except that raise_softirq_irqoff() could potentially have this
> > > 	same problem if rcu_read_unlock() is invoked at process level
> > > 	from the scheduler with either rq or pi locks held.  :-/
> > > 
> > > 	Which raises the question "why aren't I seeing hangs and
> > > 	lockdep splats?"
> > 
> > Interesting, could it be you're not seeing a hang in the regular case,
> > because enqueuing ksoftirqd on the same CPU as where the rcu_read_unlock is
> > happening is a rare event? First, ksoftirqd has to even be awakened in the
> > first place. On the other hand, with the new code the thread is always awaked
> > and is more likely to run into the issue you found?
> 
> No, in many cases, including the self-deadlock that showed up last night,
> raise_softirq_irqoff() will simply set a bit in a per-CPU variable.
> One case where this happens is when called from an interrupt handler.

I think we are saying the same thing, in some cases ksoftirqd will be
awakened and some case it will not. I will go through all scenarios to
convince myself it is safe, if I find some issue I will let you know.

> > The lockdep splats should be a more common occurence though IMO. If you could
> > let me know which RCU config is hanging, I can try to debug this at my end as
> > well.
> 
> TREE01, TREE02, TREE03, and TREE09.  I would guess that TREE08 would also
> do the same thing, given that it also sets PREEMPT=y and tests Tree RCU.
> 
> Please see the patch I posted and tested overnight.  I suspect that there
> is a better fix, but this does at least seem to suppress the error.

Ok, will do.

> > > Also, having lots of non-migratable timers might be considered unfriendly,
> > > though they shouldn't be -that- heavily utilized.  Yet, anyway...
> > > I could try adding logic to local_irq_enable() and local_irq_restore(),
> > > but that probably wouldn't go over all that well.  Besides, sometimes
> > > interrupt enabling happens in assembly language.
> > > 
> > > It is quite likely that delays to expedited grace periods wouldn't
> > > happen all that often.  First, the grace period has to start while
> > > the CPU itself (not some blocked task) is in an RCU read-side critical
> > > section, second, that critical section cannot be preempted, and third
> > > the rcu_read_unlock() must run with interrupts disabled.
> > > 
> > > Ah, but that sequence of events is not supposed to happen with the
> > > scheduler lock!
> > > 
> > > From Documentation/RCU/Design/Requirements/Requirements.html:
> > > 
> > > 	It is forbidden to hold any of scheduler's runqueue or
> > > 	priority-inheritance spinlocks across an rcu_read_unlock()
> > > 	unless interrupts have been disabled across the entire RCU
> > > 	read-side critical section, that is, up to and including the
> > > 	matching rcu_read_lock().
> > > 
> > > Here are the reasons we even get to rcu_read_unlock_special():
> > > 
> > > 1.	The just-ended RCU read-side critical section was preempted.
> > > 	This clearly cannot happen if interrupts are disabled across
> > > 	the entire critical section.
> > > 
> > > 2.	The scheduling-clock interrupt noticed that this critical
> > > 	section has been taking a long time.  But scheduling-clock
> > > 	interrupts also cannot happen while interrupts are disabled.
> > > 
> > > 3.	An expedited grace periods started during this critical
> > > 	section.  But if that happened, the corresponding IPI would
> > > 	have waited until this CPU enabled interrupts, so this
> > > 	cannot happen either.
> > > 
> > > So the call to invoke_rcu_core() should be OK after all.
> > > 
> > > Which is a bit of a disappointment, given that I am still seeing hangs!
> > 
> > Oh ok, discount whatever I just said then ;-) Indeed I remember this
> > requirement too now. Your neat documentation skills are indeed life saving :D
> 
> No, this did turn out to be the problem area.  Or at least one of the
> problem areas.  Again, see my earlier email.

Ok. Too many emails so I got confused :-D. I also forgot which version of the
patch are we testing since I don't think an updated one was posted. But I
will refer to your last night diff dig out the base patch from your git tree,
no problem.

> > > I might replace this invoke_rcu_core() with set_tsk_need_resched() and
> > > set_preempt_need_resched() to see if that gets rid of the hangs, but
> > > first...
> > 
> > Could we use the NMI watchdog to dump the stack at the time of the hang? May
> > be a deadlock will present on the stack (I think its config is called
> > HARDLOCKUP_DETECTOR or something).
> 
> Another approach would be to instrument the locking code that notices
> the recursive acquisition.  Or to run lockdep...  Because none of the
> failing scenarios enable lockdep!  ;-)

I was wondering why lockdep is not always turned on in your testing. Is it
due to performance concerns?

thanks,

 - Joel

  reply	other threads:[~2019-03-22 15:50 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-15 11:11 [PATCH] rcu: Allow to eliminate softirq processing from rcutree Sebastian Andrzej Siewior
2019-03-15 13:35 ` Steven Rostedt
2019-03-15 13:57   ` Sebastian Andrzej Siewior
2019-03-18  2:24 ` Paul E. McKenney
2019-03-19 11:44   ` [PATCH v2] " Sebastian Andrzej Siewior
2019-03-19 15:59     ` Paul E. McKenney
2019-03-19 16:24       ` Sebastian Andrzej Siewior
2019-03-19 16:50         ` Paul E. McKenney
2019-03-19 17:02           ` Sebastian Andrzej Siewior
2019-03-20 11:32     ` Sebastian Andrzej Siewior
2019-03-20 15:21       ` Paul E. McKenney
2019-03-20 15:44         ` Paul E. McKenney
2019-03-20 16:05           ` Sebastian Andrzej Siewior
2019-03-20 16:15             ` Paul E. McKenney
2019-03-20 16:35               ` Sebastian Andrzej Siewior
2019-03-20 17:30                 ` Paul E. McKenney
2019-03-20 17:59                   ` Sebastian Andrzej Siewior
2019-03-20 18:12                     ` Paul E. McKenney
2019-03-20 18:14                       ` Sebastian Andrzej Siewior
2019-03-20 21:13                         ` [PATCH v3] " Sebastian Andrzej Siewior
2019-03-20 23:46                           ` Paul E. McKenney
2019-03-21  8:27                             ` Sebastian Andrzej Siewior
2019-03-21 13:26                               ` Paul E. McKenney
2019-03-21 23:32                             ` Paul E. McKenney
2019-03-22  7:35                               ` Paul E. McKenney
2019-03-22 12:43                                 ` Paul E. McKenney
2019-03-22 13:42                               ` Joel Fernandes
2019-03-22 14:58                                 ` Paul E. McKenney
2019-03-22 15:50                                   ` Joel Fernandes [this message]
2019-03-22 16:26                                     ` Paul E. McKenney
2019-03-22 18:07                                       ` Paul E. McKenney
2019-03-22 23:48                           ` Joel Fernandes
2019-03-23  0:25                             ` Paul E. McKenney
2019-03-23  1:04                               ` Joel Fernandes
2019-03-23 16:10                               ` Paul E. McKenney
2019-03-24 23:42                                 ` Paul E. McKenney
2019-03-25 13:41                                   ` Joel Fernandes
2019-03-25 15:08                                     ` Paul E. McKenney
2019-03-25 15:52                                       ` Paul E. McKenney
2019-03-20  0:26 ` [PATCH] " Joel Fernandes
2019-03-20 11:28   ` Sebastian Andrzej Siewior
2019-03-21 12:06     ` Joel Fernandes
2019-03-21 13:52       ` Paul E. McKenney
2019-03-20 15:24   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190322155049.GA86662@google.com \
    --to=joel@joelfernandes.org \
    --cc=bigeasy@linutronix.de \
    --cc=efault@gmx.de \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.