linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	rcu <rcu@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs
Date: Fri, 28 Jun 2019 08:30:50 -0700	[thread overview]
Message-ID: <20190628153050.GU26519@linux.ibm.com> (raw)
In-Reply-To: <20190628135433.GE3402@hirez.programming.kicks-ass.net>

On Fri, Jun 28, 2019 at 03:54:33PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 27, 2019 at 11:41:07AM -0700, Paul E. McKenney wrote:
> > Or just don't do the wakeup at all, if it comes to that.  I don't know
> > of any way to determine whether rcu_read_unlock() is being called from
> > the scheduler, but it has been some time since I asked Peter Zijlstra
> > about that.
> 
> There (still) is no 'in-scheduler' state.

Well, my TREE03 + threadirqs rcutorture test ran for ten hours last
night with no problems, so we just might be OK.

The apparent fix is below, though my approach would be to do backports
for the full set of related changes.

Joel, Sebastian, how goes any testing from your end?  Any reason
to believe that this does not represent a fix?  (Me, I am still
concerned about doing raise_softirq() from within a threaded
interrupt, but am not seeing failures.)

							Thanx, Paul

------------------------------------------------------------------------

commit 23634ebc1d946f19eb112d4455c1d84948875e31
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Sun Mar 24 15:25:51 2019 -0700

    rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()
    
    When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc
    kthreads, a full and unconditional wakeup is required to initiate RCU
    core processing.  In contrast, when RCU core processing is carried
    out by RCU_SOFTIRQ, a raise_softirq() suffices.  Of course, there are
    situations where raise_softirq() does a full wakeup, but these do not
    occur with normal usage of rcu_read_unlock().
    
    The reason that full wakeups can be problematic is that the scheduler
    sometimes invokes rcu_read_unlock() with its pi or rq locks held,
    which can of course result in deadlock in CONFIG_PREEMPT=y kernels when
    rcu_read_unlock() invokes the scheduler.  Scheduler invocations can happen
    in the following situations: (1) The just-ended reader has been subjected
    to RCU priority boosting, in which case rcu_read_unlock() must deboost,
    (2) Interrupts were disabled across the call to rcu_read_unlock(), so
    the quiescent state must be deferred, requiring a wakeup of the rcuc
    kthread corresponding to the current CPU.
    
    Now, the scheduler may hold one of its locks across rcu_read_unlock()
    only if preemption has been disabled across the entire RCU read-side
    critical section, which in the days prior to RCU flavor consolidation
    meant that rcu_read_unlock() never needed to do wakeups.  However, this
    is no longer the case for any but the first rcu_read_unlock() following a
    condition (e.g., preempted RCU reader) requiring special rcu_read_unlock()
    attention.  For example, an RCU read-side critical section might be
    preempted, but preemption might be disabled across the rcu_read_unlock().
    The rcu_read_unlock() must defer the quiescent state, and therefore
    leaves the task queued on its leaf rcu_node structure.  If a scheduler
    interrupt occurs, the scheduler might well invoke rcu_read_unlock() with
    one of its locks held.  However, the preempted task is still queued, so
    rcu_read_unlock() will attempt to defer the quiescent state once more.
    When RCU core processing is carried out by RCU_SOFTIRQ, this works just
    fine: The raise_softirq() function simply sets a bit in a per-CPU mask
    and the RCU core processing will be undertaken upon return from interrupt.
    
    Not so when RCU core processing is carried out by the rcuc kthread: In this
    case, the required wakeup can result in deadlock.
    
    The initial solution to this problem was to use set_tsk_need_resched() and
    set_preempt_need_resched() to force a future context switch, which allows
    rcu_preempt_note_context_switch() to report the deferred quiescent state
    to RCU's core processing.  Unfortunately for expedited grace periods,
    there can be a significant delay between the call for a context switch
    and the actual context switch.
    
    This commit therefore introduces a ->deferred_qs flag to the task_struct
    structure's rcu_special structure.  This flag is initially false, and
    is set to true by the first call to rcu_read_unlock() requiring special
    attention, then finally reset back to false when the quiescent state is
    finally reported.  Then rcu_read_unlock() attempts full wakeups only when
    ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring
    special attention.  Note that a chain of RCU readers linked by some other
    sort of reader may find that a later rcu_read_unlock() is once again able
    to do a full wakeup, courtesy of an intervening preemption:
    
            rcu_read_lock();
            /* preempted */
            local_irq_disable();
            rcu_read_unlock(); /* Can do full wakeup, sets ->deferred_qs. */
            rcu_read_lock();
            local_irq_enable();
            preempt_disable()
            rcu_read_unlock(); /* Cannot do full wakeup, ->deferred_qs set. */
            rcu_read_lock();
            preempt_enable();
            /* preempted, >deferred_qs reset. */
            local_irq_disable();
            rcu_read_unlock(); /* Can again do full wakeup, sets ->deferred_qs. */
    
    Such linked RCU readers do not yet seem to appear in the Linux kernel, and
    it is probably best if they don't.  However, RCU needs to handle them, and
    some variations on this theme could make even raise_softirq() unsafe due to
    the possibility of its doing a full wakeup.  This commit therefore also
    avoids invoking raise_softirq() when the ->deferred_qs set flag is set.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..942a44c1b8eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -565,7 +565,7 @@ union rcu_special {
 		u8			blocked;
 		u8			need_qs;
 		u8			exp_hint; /* Hint for performance. */
-		u8			pad; /* No garbage from compiler! */
+		u8			deferred_qs;
 	} b; /* Bits. */
 	u32 s; /* Set of bits. */
 };
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 21611862e083..75110ea75d01 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -455,6 +455,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 		local_irq_restore(flags);
 		return;
 	}
+	t->rcu_read_unlock_special.b.deferred_qs = false;
 	if (special.b.need_qs) {
 		rcu_qs();
 		t->rcu_read_unlock_special.b.need_qs = false;
@@ -605,16 +606,24 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	local_irq_save(flags);
 	irqs_were_disabled = irqs_disabled_flags(flags);
 	if (preempt_bh_were_disabled || irqs_were_disabled) {
-		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
-		/* Need to defer quiescent state until everything is enabled. */
-		if (irqs_were_disabled && use_softirq) {
-			/* Enabling irqs does not reschedule, so... */
+		t->rcu_read_unlock_special.b.exp_hint = false;
+		// Need to defer quiescent state until everything is enabled.
+		if (irqs_were_disabled && use_softirq &&
+		    (in_irq() || !t->rcu_read_unlock_special.b.deferred_qs)) {
+			// Using softirq, safe to awaken, and we get
+			// no help from enabling irqs, unlike bh/preempt.
 			raise_softirq_irqoff(RCU_SOFTIRQ);
+		} else if (irqs_were_disabled && !use_softirq &&
+			   !t->rcu_read_unlock_special.b.deferred_qs) {
+			// Safe to awaken and we get no help from enabling
+			// irqs, unlike bh/preempt.
+			invoke_rcu_core();
 		} else {
-			/* Enabling BH or preempt does reschedule, so... */
+			// Enabling BH or preempt does reschedule, so...
 			set_tsk_need_resched(current);
 			set_preempt_need_resched();
 		}
+		t->rcu_read_unlock_special.b.deferred_qs = true;
 		local_irq_restore(flags);
 		return;
 	}

  reply	other threads:[~2019-06-28 15:31 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 13:54 [RFC] Deadlock via recursive wakeup via RCU with threadirqs Sebastian Andrzej Siewior
2019-06-26 16:25 ` Paul E. McKenney
2019-06-27  7:47   ` Sebastian Andrzej Siewior
2019-06-27 15:52     ` Paul E. McKenney
2019-06-27 14:24   ` Joel Fernandes
2019-06-27 14:34     ` Steven Rostedt
2019-06-27 15:30       ` Joel Fernandes
2019-06-27 15:37         ` Joel Fernandes
2019-06-27 15:40           ` Sebastian Andrzej Siewior
2019-06-27 15:42             ` Joel Fernandes
2019-06-27 17:43             ` Joel Fernandes
2019-06-27 17:46               ` Joel Fernandes
2019-06-27 18:11                 ` Paul E. McKenney
2019-06-27 18:27                   ` Joel Fernandes
2019-06-27 18:51                     ` Joel Fernandes
2019-06-27 19:14                       ` Paul E. McKenney
2019-06-27 19:15                     ` Paul E. McKenney
2019-06-27 18:30                   ` Paul E. McKenney
2019-06-27 20:45                     ` Paul E. McKenney
2019-06-27 15:55         ` Paul E. McKenney
2019-06-27 16:47           ` Joel Fernandes
2019-06-27 17:38             ` Paul E. McKenney
2019-06-27 18:16               ` Joel Fernandes
2019-06-27 18:41                 ` Paul E. McKenney
2019-06-27 20:17                   ` Scott Wood
2019-06-27 20:36                     ` Paul E. McKenney
2019-06-28  7:31                       ` Byungchul Park
2019-06-28  7:43                         ` Byungchul Park
2019-06-28  8:14                           ` Byungchul Park
2019-06-28  8:24                             ` Byungchul Park
2019-06-28 12:24                               ` Paul E. McKenney
2019-06-28  9:10                           ` Byungchul Park
2019-06-28  9:28                             ` Byungchul Park
2019-06-28 12:21                           ` Paul E. McKenney
2019-06-28 10:40                         ` Byungchul Park
2019-06-28 12:27                           ` Paul E. McKenney
2019-06-28 15:44                           ` Steven Rostedt
2019-06-29 15:12                             ` Andrea Parri
2019-06-29 16:55                               ` Paul E. McKenney
2019-06-29 18:09                                 ` Andrea Parri
2019-06-29 18:21                                   ` Andrea Parri
2019-06-29 19:15                                   ` Paul E. McKenney
2019-06-29 19:35                                     ` Andrea Parri
2019-06-30 23:55                             ` Byungchul Park
2019-06-28 14:15                       ` Peter Zijlstra
2019-06-28 15:54                         ` Paul E. McKenney
2019-06-28 16:04                           ` Peter Zijlstra
2019-06-28 17:20                             ` Paul E. McKenney
2019-07-01  9:42                               ` Peter Zijlstra
2019-07-01 10:24                                 ` Sebastian Andrzej Siewior
2019-07-01 12:23                                   ` Paul E. McKenney
2019-07-01 14:00                                     ` Peter Zijlstra
2019-07-01 16:01                                       ` Paul E. McKenney
2019-06-28 20:01                         ` Scott Wood
2019-07-01  9:45                           ` Peter Zijlstra
2019-06-28 13:54                   ` Peter Zijlstra
2019-06-28 15:30                     ` Paul E. McKenney [this message]
2019-06-28 18:40                       ` Sebastian Andrzej Siewior
2019-06-28 18:52                         ` Paul E. McKenney
2019-06-28 19:24                           ` Joel Fernandes
2019-06-28 20:04                             ` Paul E. McKenney
2019-06-28 21:40                               ` Joel Fernandes
2019-06-28 22:25                                 ` Paul E. McKenney
2019-06-28 23:12                                   ` Joel Fernandes
2019-06-29  0:06                                     ` Paul E. McKenney
2019-06-28 16:40                   ` Joel Fernandes
2019-06-28 16:45                     ` Joel Fernandes
2019-06-28 17:30                       ` Paul E. McKenney
2019-06-28 17:41                         ` Paul E. McKenney
2019-06-28 17:45                         ` Sebastian Andrzej Siewior
2019-06-28 18:07                           ` Joel Fernandes
2019-06-28 18:20                             ` Sebastian Andrzej Siewior
2019-07-01  2:08                               ` Joel Fernandes
2019-06-28 18:22                           ` Paul E. McKenney
2019-06-28 19:29                             ` Joel Fernandes
2019-06-28 20:06                               ` Paul E. McKenney
2019-06-28 18:05                         ` Joel Fernandes
2019-06-28 18:23                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190628153050.GU26519@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=bigeasy@linutronix.de \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).