All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Byungchul Park <byungchul.park@lge.com>
Cc: Scott Wood <swood@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	rcu <rcu@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	kernel-team@lge.com
Subject: Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs
Date: Fri, 28 Jun 2019 05:24:07 -0700	[thread overview]
Message-ID: <20190628122407.GR26519@linux.ibm.com> (raw)
In-Reply-To: <20190628082438.GB22890@X58A-UD3R>

On Fri, Jun 28, 2019 at 05:24:38PM +0900, Byungchul Park wrote:
> On Fri, Jun 28, 2019 at 05:14:32PM +0900, Byungchul Park wrote:
> > On Fri, Jun 28, 2019 at 04:43:50PM +0900, Byungchul Park wrote:
> > > On Fri, Jun 28, 2019 at 04:31:38PM +0900, Byungchul Park wrote:
> > > > On Thu, Jun 27, 2019 at 01:36:12PM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Jun 27, 2019 at 03:17:27PM -0500, Scott Wood wrote:
> > > > > > On Thu, 2019-06-27 at 11:41 -0700, Paul E. McKenney wrote:
> > > > > > > On Thu, Jun 27, 2019 at 02:16:38PM -0400, Joel Fernandes wrote:
> > > > > > > > 
> > > > > > > > I think the fix should be to prevent the wake-up not based on whether we
> > > > > > > > are
> > > > > > > > in hard/soft-interrupt mode but that we are doing the rcu_read_unlock()
> > > > > > > > from
> > > > > > > > a scheduler path (if we can detect that)
> > > > > > > 
> > > > > > > Or just don't do the wakeup at all, if it comes to that.  I don't know
> > > > > > > of any way to determine whether rcu_read_unlock() is being called from
> > > > > > > the scheduler, but it has been some time since I asked Peter Zijlstra
> > > > > > > about that.
> > > > > > > 
> > > > > > > Of course, unconditionally refusing to do the wakeup might not be happy
> > > > > > > thing for NO_HZ_FULL kernels that don't implement IRQ work.
> > > > > > 
> > > > > > Couldn't smp_send_reschedule() be used instead?
> > > > > 
> > > > > Good point.  If current -rcu doesn't fix things for Sebastian's case,
> > > > > that would be well worth looking at.  But there must be some reason
> > > > > why Peter Zijlstra didn't suggest it when he instead suggested using
> > > > > the IRQ work approach.
> > > > > 
> > > > > Peter, thoughts?
> > > > 
> > > 
> > > +cc kernel-team@lge.com
> > > (I'm sorry for more noise on the thread.)
> > > 
> > > > Hello,
> > > > 
> > > > Isn't the following scenario possible?
> > > > 
> > > > The original code
> > > > -----------------
> > > > rcu_read_lock();
> > > > ...
> > > > /* Experdite */
> > > > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > ...
> > > > __rcu_read_unlock();
> > > > 	if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> > > > 		rcu_read_unlock_special(t);
> > > > 			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > 			rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > 		barrier();  /* ->rcu_read_unlock_special load before assign */
> > > > 		t->rcu_read_lock_nesting = 0;
> > > > 
> > > > The reordered code by machine
> > > > -----------------------------
> > > > rcu_read_lock();
> > > > ...
> > > > /* Experdite */
> > > > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > ...
> > > > __rcu_read_unlock();
> > > > 	if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> > > > 		rcu_read_unlock_special(t);
> > > > 		t->rcu_read_lock_nesting = 0; <--- LOOK AT THIS!!!
> > > > 			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > 			rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > 		barrier();  /* ->rcu_read_unlock_special load before assign */
> > > > 
> > > > An interrupt happens
> > > > --------------------
> > > > rcu_read_lock();
> > > > ...
> > > > /* Experdite */
> > > > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > ...
> > > > __rcu_read_unlock();
> > > > 	if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> > > > 		rcu_read_unlock_special(t);
> > > > 		t->rcu_read_lock_nesting = 0; <--- LOOK AT THIS!!!
> > > > <--- Handle an (any) irq
> > > > 	rcu_read_lock();
> > > > 	/* This call should be skipped */
> > > > 	rcu_read_unlock_special(t);
> > > > 			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > 			rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > 		barrier();  /* ->rcu_read_unlock_special load before assign */
> > > > 
> > > > We don't have to handle the special thing twice like this which is one
> > > > reason to cause the problem even though another problem is of course to
> > > > call ttwu w/o being aware it's within a context holding pi lock.
> > > > 
> > > > Apart from the discussion about how to avoid ttwu in an improper
> > > > condition, I think the following is necessary. I may have something
> > > > missing. It would be appreciated if you let me know in case I'm wrong.
> > > > 
> > > > Anyway, logically I think we should prevent reordering between
> > > > t->rcu_read_lock_nesting and t->rcu_read_unlock_special.b.exp_hint not
> > > > only by compiler but also by machine like the below.
> > > > 
> > > > Do I miss something?
> > > > 
> > > > Thanks,
> > > > Byungchul
> > > > 
> > > > ---8<---
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 3c8444e..9b137f1 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -412,7 +412,13 @@ void __rcu_read_unlock(void)
> > > >  		barrier();  /* assign before ->rcu_read_unlock_special load */
> > > >  		if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> > > >  			rcu_read_unlock_special(t);
> > > > -		barrier();  /* ->rcu_read_unlock_special load before assign */
> > > > +		/*
> > > > +		 * Prevent reordering between clearing
> > > > +		 * t->rcu_reak_unlock_special in
> > > > +		 * rcu_read_unlock_special() and the following
> > > > +		 * assignment to t->rcu_read_lock_nesting.
> > > > +		 */
> > > > +		smp_wmb();
> > 
> > Ah. But the problem is this makes rcu_read_unlock() heavier, which is
> > too bad. Need to consider something else. But I'm still curious about
> > if the scenario I told you is correct?
> 
> Instead, this patch should be replaced with the following:
> 
> ---8<---
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c8444e..f103e98 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -624,8 +624,15 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  
>  	local_irq_save(flags);
>  	irqs_were_disabled = irqs_disabled_flags(flags);
> +
> +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> +	/*
> +	 * Prevent reordering between rcu_read_unlock_special.b.exp_hint
> +	 * above and rcu_read_lock_nesting outside of this function.
> +	 */
> +	smp_wmb();

Except that these are manipulated by the current CPU (aside from debug
code), so inter-CPU ordering is not needed.  Plus .exp_hint is just a
heuristic.

Or am I missing something subtle here?  If so, please provide a
step-by-step explanation of the failure scenario.

							Thanx, Paul

> +
>  	if (preempt_bh_were_disabled || irqs_were_disabled) {
> -		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
>  		/* Need to defer quiescent state until everything is enabled. */
>  		if (irqs_were_disabled) {
>  			/* Enabling irqs does not reschedule, so... */
> @@ -638,7 +645,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		local_irq_restore(flags);
>  		return;
>  	}
> -	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  }
>  
> 

  reply	other threads:[~2019-06-28 12:25 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 13:54 [RFC] Deadlock via recursive wakeup via RCU with threadirqs Sebastian Andrzej Siewior
2019-06-26 16:25 ` Paul E. McKenney
2019-06-27  7:47   ` Sebastian Andrzej Siewior
2019-06-27 15:52     ` Paul E. McKenney
2019-06-27 14:24   ` Joel Fernandes
2019-06-27 14:34     ` Steven Rostedt
2019-06-27 15:30       ` Joel Fernandes
2019-06-27 15:37         ` Joel Fernandes
2019-06-27 15:40           ` Sebastian Andrzej Siewior
2019-06-27 15:42             ` Joel Fernandes
2019-06-27 17:43             ` Joel Fernandes
2019-06-27 17:46               ` Joel Fernandes
2019-06-27 18:11                 ` Paul E. McKenney
2019-06-27 18:27                   ` Joel Fernandes
2019-06-27 18:51                     ` Joel Fernandes
2019-06-27 19:14                       ` Paul E. McKenney
2019-06-27 19:15                     ` Paul E. McKenney
2019-06-27 18:30                   ` Paul E. McKenney
2019-06-27 20:45                     ` Paul E. McKenney
2019-06-27 15:55         ` Paul E. McKenney
2019-06-27 16:47           ` Joel Fernandes
2019-06-27 17:38             ` Paul E. McKenney
2019-06-27 18:16               ` Joel Fernandes
2019-06-27 18:41                 ` Paul E. McKenney
2019-06-27 20:17                   ` Scott Wood
2019-06-27 20:36                     ` Paul E. McKenney
2019-06-28  7:31                       ` Byungchul Park
2019-06-28  7:43                         ` Byungchul Park
2019-06-28  8:14                           ` Byungchul Park
2019-06-28  8:24                             ` Byungchul Park
2019-06-28 12:24                               ` Paul E. McKenney [this message]
2019-06-28  9:10                           ` Byungchul Park
2019-06-28  9:28                             ` Byungchul Park
2019-06-28 12:21                           ` Paul E. McKenney
2019-06-28 10:40                         ` Byungchul Park
2019-06-28 12:27                           ` Paul E. McKenney
2019-06-28 15:44                           ` Steven Rostedt
2019-06-29 15:12                             ` Andrea Parri
2019-06-29 16:55                               ` Paul E. McKenney
2019-06-29 18:09                                 ` Andrea Parri
2019-06-29 18:21                                   ` Andrea Parri
2019-06-29 19:15                                   ` Paul E. McKenney
2019-06-29 19:35                                     ` Andrea Parri
2019-06-30 23:55                             ` Byungchul Park
2019-06-28 14:15                       ` Peter Zijlstra
2019-06-28 15:54                         ` Paul E. McKenney
2019-06-28 16:04                           ` Peter Zijlstra
2019-06-28 17:20                             ` Paul E. McKenney
2019-07-01  9:42                               ` Peter Zijlstra
2019-07-01 10:24                                 ` Sebastian Andrzej Siewior
2019-07-01 12:23                                   ` Paul E. McKenney
2019-07-01 14:00                                     ` Peter Zijlstra
2019-07-01 16:01                                       ` Paul E. McKenney
2019-06-28 20:01                         ` Scott Wood
2019-07-01  9:45                           ` Peter Zijlstra
2019-06-28 13:54                   ` Peter Zijlstra
2019-06-28 15:30                     ` Paul E. McKenney
2019-06-28 18:40                       ` Sebastian Andrzej Siewior
2019-06-28 18:52                         ` Paul E. McKenney
2019-06-28 19:24                           ` Joel Fernandes
2019-06-28 20:04                             ` Paul E. McKenney
2019-06-28 21:40                               ` Joel Fernandes
2019-06-28 22:25                                 ` Paul E. McKenney
2019-06-28 23:12                                   ` Joel Fernandes
2019-06-29  0:06                                     ` Paul E. McKenney
2019-06-28 16:40                   ` Joel Fernandes
2019-06-28 16:45                     ` Joel Fernandes
2019-06-28 17:30                       ` Paul E. McKenney
2019-06-28 17:41                         ` Paul E. McKenney
2019-06-28 17:45                         ` Sebastian Andrzej Siewior
2019-06-28 18:07                           ` Joel Fernandes
2019-06-28 18:20                             ` Sebastian Andrzej Siewior
2019-07-01  2:08                               ` Joel Fernandes
2019-06-28 18:22                           ` Paul E. McKenney
2019-06-28 19:29                             ` Joel Fernandes
2019-06-28 20:06                               ` Paul E. McKenney
2019-06-28 18:05                         ` Joel Fernandes
2019-06-28 18:23                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190628122407.GR26519@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=bigeasy@linutronix.de \
    --cc=byungchul.park@lge.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=swood@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.