All of lore.kernel.org
 help / color / mirror / Atom feed
* Quick review of -rt RCU-related patches
@ 2011-10-04 17:47 Paul E. McKenney
  2011-10-04 22:05 ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2011-10-04 17:47 UTC (permalink / raw)
  To: tglx; +Cc: linux-rt-users

Hello, Thomas,

A quick review of the RCU-related -rt patches, along with a list of
RCU bug fixes on their way to mainline.  One of these is not yet in
-tip, so the patch is included.

No smoking guns, at least unless the system is being hammered by a
network-based DoS attack.

							Thanx, Paul

------------------------------------------------------------------------

fs-add-missing-rcu-protection.patch

	Just adds a rcu_read_lock()/rcu_read_unlock() pair.
	Should be inoffensive.

rcu-reduce-lock-section.patch

	Prevents a wakeup within an irq-disabled raw spinlock.
	But which wakeup?

	o	rcu_report_exp_rnp() is only permitted to do a wakeup
		if called with rrupts enabled.

	o	All calls enable wakeups -except- for the call from
		sync_rcu_preempt_exp_init(), but in that case, there
		is nothing to wake up -- it is in fact running doing
		the initialization.

	In theory, this should be OK.  If it was broken, the symptom
	would be an expedited grace period blocking forever.

signal-fix-up-rcu-wreckage.patch

	Makes __lock_task_sighand() do local_irq_save_nort() instead
	of local_irq_save().  The RCU implementation should be OK with
	this.  The signal implementation might or might not.

sched-might-sleep-do-not-account-rcu-depth.patch

	Yow!!!  For CONFIG_PREEMPT_RT_FULL, this redefines
	sched_rcu_preempt_depth() to always return zero.

	This means that might_sleep() will never complain about
	blocking in an RCU read-side critical section.  I guess that
	this is necessary, though it would be better to have some
	way to complain for general sleeping (e.g., waiting on
	network receive) as opposed to blocking on a lock that is
	subject to priority inheritance.

rcu-force-preempt-rcu-for-rt.patch

	This one should be obsoleted by commit 8008e129dc9, which is
	queued in -tip, hopefully for 3.2.

	Except for the RT_GROUP_SCHED change, which is unrelated.

rcu-disable-the-rcu-bh-stuff-for-rt.patch

	This implements RCU-bh in terms of RCU-preempt, but disables
	BH around the resulting RCU-preempt read-side critical section.
	May be vulnerable to network-based denial-of-service attacks,
	which could OOM a system with this patch.

	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
	an ideal world, it would also complain if not local_bh_disable().

_paul_e__mckenney_-eliminate_-_rcu_boosted.patch

	Looks fine, but I might not be the best reviewer for this one.

peter_zijlstra-frob-rcu.patch

	Looks OK.  Hmmm...  Should this one go to mainline?
	Oh, looks equivalent, actually.  So why the change?

Possible fixes from the 3.2-bound RCU commits in -tip:

7eb4f455 (Make rcu_implicit_dynticks_qs() locals be correct size)

	Symptoms: misbehavior on 32-bit systems after a given CPU went
		through about 2^30 dyntick-idle transitions.  You would
		have to work pretty hard to get this one to happen.

5c51dd73 (Prevent early boot set_need_resched() from __rcu_pending())

	Symptoms: boot-time hangs for rare configurations.

037067a1 (Prohibit grace periods during early boot)

	Symptoms: boot-time hangs for rare configurations.

06ae115a (Avoid having just-onlined CPU resched itself when RCU is idle)

	Symptoms: boot-time hangs for rare configurations.

5b61b0ba (Wire up RCU_BOOST_PRIO for rcutree)

	Symptoms: The system uses RT priority 1 for boosting regardless
		of the value of CONFIG_RCU_BOOST_PRIO.

e90c53d3 (Remove rcu_needs_cpu_flush() to avoid false quiescent states)

	Symptoms: Too-short grace periods for RCU_FAST_NO_HZ=y.
		But simpler to set RCU_FAST_NO_HZ=n.

And the following is a patch to a theoretical problem with expedited
grace periods.

------------------------------------------------------------------------

rcu: Avoid RCU-preempt expedited grace-period botch

Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1.	Task A exits its RCU read-side critical section, and removes
	itself from the ->blkd_tasks list, releases rnp->lock, and is
	then preempted.  Task B remains on the ->blkd_tasks list, and
	blocks the current expedited grace period.

2.	Task B exits from its RCU read-side critical section and removes
	itself from the ->blkd_tasks list.  Because it is the last task
	blocking the current expedited grace period, it ends that
	expedited grace period.

3.	Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
	of course indicates that nothing is blocking the nonexistent
	expedited grace period. Task A is again preempted.

4.	Some other CPU starts an expedited grace period.  There are several
	tasks blocking this expedited grace period queued on the
	same rcu_node structure that Task A was using in step 1 above.

5.	Task A examines its state and incorrectly concludes that it was
	the last task blocking the expedited grace period on the current
	rcu_node structure.  It therefore reports completion up the
	rcu_node tree.

6.	The expedited grace period can then incorrectly complete before
	the tasks blocked on this same rcu_node structure exit their
	RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4b9b9f8..7986053 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -312,6 +312,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
 	int empty_exp;
+	int empty_exp_now;
 	unsigned long flags;
 	struct list_head *np;
 #ifdef CONFIG_RCU_BOOST
@@ -382,8 +383,10 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 		/*
 		 * If this was the last task on the current list, and if
 		 * we aren't waiting on any CPUs, report the quiescent state.
-		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock.
+		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
+		 * so we must take a snapshot of the expedited state.
 		 */
+		empty_exp_now = !rcu_preempted_readers_exp(rnp);
 		if (!empty && !rcu_preempt_blocked_readers_cgp(rnp)) {
 			trace_rcu_quiescent_state_report("preempt_rcu",
 							 rnp->gpnum,
@@ -406,7 +409,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 		 * If this was the last task on the expedited lists,
 		 * then we need to report up the rcu_node hierarchy.
 		 */
-		if (!empty_exp && !rcu_preempted_readers_exp(rnp))
+		if (!empty_exp && empty_exp_now)
 			rcu_report_exp_rnp(&rcu_preempt_state, rnp);
 	} else {
 		local_irq_restore(flags);


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 17:47 Quick review of -rt RCU-related patches Paul E. McKenney
@ 2011-10-04 22:05 ` Thomas Gleixner
  2011-10-04 22:12   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Thomas Gleixner @ 2011-10-04 22:05 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-rt-users, LKML, Peter Zijlstra

Paul,

On Tue, 4 Oct 2011, Paul E. McKenney wrote:
> rcu-reduce-lock-section.patch
> 
> 	Prevents a wakeup within an irq-disabled raw spinlock.
> 	But which wakeup?

The call comes from:

    synchronize_rcu_expedited()
	raw_spin_lock_irqsave(&rsp->onofflock, flags);
        sync_rcu_preempt_exp_init()
	   rcu_report_exp_rnp()
    
 
> 	o	rcu_report_exp_rnp() is only permitted to do a wakeup
> 		if called with rrupts enabled.

That's not what the code does, obviously.
 
> 	o	All calls enable wakeups -except- for the call from
> 		sync_rcu_preempt_exp_init(), but in that case, there
> 		is nothing to wake up -- it is in fact running doing
> 		the initialization.

Right, though the problem is that rcu_report_exp_rnp() unconditionally
calls wake_up() which takes the wait_queue lock (a "sleeping spinlock"
on RT). So the patch merily prevents the call when called from
synchronize_rcu_expedited() ...
 
> signal-fix-up-rcu-wreckage.patch
> 
> 	Makes __lock_task_sighand() do local_irq_save_nort() instead
> 	of local_irq_save().  The RCU implementation should be OK with
> 	this.  The signal implementation might or might not.

It does :)
 
> sched-might-sleep-do-not-account-rcu-depth.patch
> 
> 	Yow!!!  For CONFIG_PREEMPT_RT_FULL, this redefines
> 	sched_rcu_preempt_depth() to always return zero.
> 
> 	This means that might_sleep() will never complain about
> 	blocking in an RCU read-side critical section.  I guess that
> 	this is necessary, though it would be better to have some
> 	way to complain for general sleeping (e.g., waiting on
> 	network receive) as opposed to blocking on a lock that is
> 	subject to priority inheritance.

Well, there's always a remaining problem. We need that stuff fully
preemptible on rt. Any ideas ?
 
> rcu-force-preempt-rcu-for-rt.patch
> 
> 	This one should be obsoleted by commit 8008e129dc9, which is
> 	queued in -tip, hopefully for 3.2.

Ok.
 
> 	Except for the RT_GROUP_SCHED change, which is unrelated.

Huch, that should be in a different patch
 
> rcu-disable-the-rcu-bh-stuff-for-rt.patch
> 
> 	This implements RCU-bh in terms of RCU-preempt, but disables
> 	BH around the resulting RCU-preempt read-side critical section.
> 	May be vulnerable to network-based denial-of-service attacks,
> 	which could OOM a system with this patch.
> 
> 	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
> 	an ideal world, it would also complain if not local_bh_disable().

Well, I dropped that after our IRC conversation, but we still need to
have some extra debugging for RT to diagnose situations where we break
those rcu_bh assumptions. That _bh rcu stuff should have never been
there, we'd rather should drop the softirq processing back to
ksoftirqd in such an overload case (in mainline) and voluntary
schedule away from ksoftirqd until the situation is resolved.

I consider rcu_bh a bandaid for the nasty implict semantics of BH
processing and I'm looking really forward to Peters analysis of the
modern cpu local BKL constructs at RTLWS.

The patch stemmed from an earlier discussion about getting rid of
those special rcu_bh semantics even in mainline for the sake of not
making a special case for a completely over(ab)used construct which
originates from the historically grown softirq behaviour. I think that
keeping the special cased rcu_bh stuff around is just taking any
incentive away from moving that whole softirq processing into a
scheduler controllable environment (i.e. threaded interrupts). 

> _paul_e__mckenney_-eliminate_-_rcu_boosted.patch
> 
> 	Looks fine, but I might not be the best reviewer for this one.

:)
 
> peter_zijlstra-frob-rcu.patch
> 
> 	Looks OK.  Hmmm...  Should this one go to mainline?
> 	Oh, looks equivalent, actually.  So why the change?

Peter ?
 
> Possible fixes from the 3.2-bound RCU commits in -tip:
> 
> 7eb4f455 (Make rcu_implicit_dynticks_qs() locals be correct size)
> 
> 	Symptoms: misbehavior on 32-bit systems after a given CPU went
> 		through about 2^30 dyntick-idle transitions.  You would
> 		have to work pretty hard to get this one to happen.

Definitley and most of our test systems run with nohz=off
 
> 5c51dd73 (Prevent early boot set_need_resched() from __rcu_pending())
> 
> 	Symptoms: boot-time hangs for rare configurations.

Never seen that one

> 037067a1 (Prohibit grace periods during early boot)
> 
> 	Symptoms: boot-time hangs for rare configurations.

Ditto
 
> 06ae115a (Avoid having just-onlined CPU resched itself when RCU is idle)
> 
> 	Symptoms: boot-time hangs for rare configurations.

Ditto
 
> 5b61b0ba (Wire up RCU_BOOST_PRIO for rcutree)
> 
> 	Symptoms: The system uses RT priority 1 for boosting regardless
> 		of the value of CONFIG_RCU_BOOST_PRIO.

Makes sense
 
> e90c53d3 (Remove rcu_needs_cpu_flush() to avoid false quiescent states)
> 
> 	Symptoms: Too-short grace periods for RCU_FAST_NO_HZ=y.
> 		But simpler to set RCU_FAST_NO_HZ=n.

We probably don't have that on
 
> And the following is a patch to a theoretical problem with expedited
> grace periods.
> 
> ------------------------------------------------------------------------
> 
> rcu: Avoid RCU-preempt expedited grace-period botch

Were you able to actually trigger that?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 22:05 ` Thomas Gleixner
@ 2011-10-04 22:12   ` Peter Zijlstra
  2011-10-04 23:15     ` Paul E. McKenney
  2011-10-04 22:13   ` Peter Zijlstra
  2011-10-04 23:15   ` Paul E. McKenney
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2011-10-04 22:12 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul E. McKenney, linux-rt-users, LKML

On Wed, 2011-10-05 at 00:05 +0200, Thomas Gleixner wrote:
> > peter_zijlstra-frob-rcu.patch
> > 
> >       Looks OK.  Hmmm...  Should this one go to mainline?
> >       Oh, looks equivalent, actually.  So why the change?
> 
> Peter ? 

-       if (in_irq() || in_serving_softirq()) {
+       if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {


For !rt its equivalent yes, for rt otoh its not:

int in_serving_softirq(void)
{
        int res;

        preempt_disable();
        res = __get_cpu_var(local_softirq_runner) == current;
        preempt_enable();
        return res;
}

However invoke_softirq() will still add SOFTIRQ_OFFSET so we need to
look at that to avoid recursion issues.

The changelog describes this. So this change is a direct consequence of
-rt frobbing the softirq stuff and thus isn't needed upstream. 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 22:05 ` Thomas Gleixner
  2011-10-04 22:12   ` Peter Zijlstra
@ 2011-10-04 22:13   ` Peter Zijlstra
  2011-10-04 23:15   ` Paul E. McKenney
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2011-10-04 22:13 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul E. McKenney, linux-rt-users, LKML

On Wed, 2011-10-05 at 00:05 +0200, Thomas Gleixner wrote:
> I consider rcu_bh a bandaid for the nasty implict semantics of BH
> processing and I'm looking really forward to Peters analysis of the
> modern cpu local BKL constructs at RTLWS. 

You and me both! Oh wait.. :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 22:05 ` Thomas Gleixner
  2011-10-04 22:12   ` Peter Zijlstra
  2011-10-04 22:13   ` Peter Zijlstra
@ 2011-10-04 23:15   ` Paul E. McKenney
  2011-10-04 23:27     ` Thomas Gleixner
  2 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2011-10-04 23:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users, LKML, Peter Zijlstra

On Wed, Oct 05, 2011 at 12:05:47AM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Tue, 4 Oct 2011, Paul E. McKenney wrote:
> > rcu-reduce-lock-section.patch
> > 
> > 	Prevents a wakeup within an irq-disabled raw spinlock.
> > 	But which wakeup?
> 
> The call comes from:
> 
>     synchronize_rcu_expedited()
> 	raw_spin_lock_irqsave(&rsp->onofflock, flags);
>         sync_rcu_preempt_exp_init()
> 	   rcu_report_exp_rnp()
>     
> 
> > 	o	rcu_report_exp_rnp() is only permitted to do a wakeup
> > 		if called with rrupts enabled.
> 
> That's not what the code does, obviously.
> 
> > 	o	All calls enable wakeups -except- for the call from
> > 		sync_rcu_preempt_exp_init(), but in that case, there
> > 		is nothing to wake up -- it is in fact running doing
> > 		the initialization.
> 
> Right, though the problem is that rcu_report_exp_rnp() unconditionally
> calls wake_up() which takes the wait_queue lock (a "sleeping spinlock"
> on RT). So the patch merily prevents the call when called from
> synchronize_rcu_expedited() ...

OK, got it.

> > signal-fix-up-rcu-wreckage.patch
> > 
> > 	Makes __lock_task_sighand() do local_irq_save_nort() instead
> > 	of local_irq_save().  The RCU implementation should be OK with
> > 	this.  The signal implementation might or might not.
> 
> It does :)
> 
> > sched-might-sleep-do-not-account-rcu-depth.patch
> > 
> > 	Yow!!!  For CONFIG_PREEMPT_RT_FULL, this redefines
> > 	sched_rcu_preempt_depth() to always return zero.
> > 
> > 	This means that might_sleep() will never complain about
> > 	blocking in an RCU read-side critical section.  I guess that
> > 	this is necessary, though it would be better to have some
> > 	way to complain for general sleeping (e.g., waiting on
> > 	network receive) as opposed to blocking on a lock that is
> > 	subject to priority inheritance.
> 
> Well, there's always a remaining problem. We need that stuff fully
> preemptible on rt. Any ideas ?

Not yet.  We would have to classify context switches into two groups:

1.	Preemptions or blocking waiting for sleeping spinlocks.

2.	Everything else.

Given that classification, it would be straightforward: prohibit
group #2 context switches while in RCU-preempt read-side critical
sections.  I know, easy for me to say!  ;-)

> > rcu-force-preempt-rcu-for-rt.patch
> > 
> > 	This one should be obsoleted by commit 8008e129dc9, which is
> > 	queued in -tip, hopefully for 3.2.
> 
> Ok.
> 
> > 	Except for the RT_GROUP_SCHED change, which is unrelated.
> 
> Huch, that should be in a different patch

I was wondering about that.

> > rcu-disable-the-rcu-bh-stuff-for-rt.patch
> > 
> > 	This implements RCU-bh in terms of RCU-preempt, but disables
> > 	BH around the resulting RCU-preempt read-side critical section.
> > 	May be vulnerable to network-based denial-of-service attacks,
> > 	which could OOM a system with this patch.
> > 
> > 	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
> > 	an ideal world, it would also complain if not local_bh_disable().
> 
> Well, I dropped that after our IRC conversation, but we still need to
> have some extra debugging for RT to diagnose situations where we break
> those rcu_bh assumptions. That _bh rcu stuff should have never been
> there, we'd rather should drop the softirq processing back to
> ksoftirqd in such an overload case (in mainline) and voluntary
> schedule away from ksoftirqd until the situation is resolved.
> 
> I consider rcu_bh a bandaid for the nasty implict semantics of BH
> processing and I'm looking really forward to Peters analysis of the
> modern cpu local BKL constructs at RTLWS.
> 
> The patch stemmed from an earlier discussion about getting rid of
> those special rcu_bh semantics even in mainline for the sake of not
> making a special case for a completely over(ab)used construct which
> originates from the historically grown softirq behaviour. I think that
> keeping the special cased rcu_bh stuff around is just taking any
> incentive away from moving that whole softirq processing into a
> scheduler controllable environment (i.e. threaded interrupts). 

Between -rt and the guys pushing packets, I can tell that this is going
to be a fun one.  ;-)

I will see if I can come up with a way to make that patch safe to
apply.  Might not be all that hard.

> > _paul_e__mckenney_-eliminate_-_rcu_boosted.patch
> > 
> > 	Looks fine, but I might not be the best reviewer for this one.
> 
> :)
> 
> > peter_zijlstra-frob-rcu.patch
> > 
> > 	Looks OK.  Hmmm...  Should this one go to mainline?
> > 	Oh, looks equivalent, actually.  So why the change?
> 
> Peter ?
> 
> > Possible fixes from the 3.2-bound RCU commits in -tip:
> > 
> > 7eb4f455 (Make rcu_implicit_dynticks_qs() locals be correct size)
> > 
> > 	Symptoms: misbehavior on 32-bit systems after a given CPU went
> > 		through about 2^30 dyntick-idle transitions.  You would
> > 		have to work pretty hard to get this one to happen.
> 
> Definitley and most of our test systems run with nohz=off
> 
> > 5c51dd73 (Prevent early boot set_need_resched() from __rcu_pending())
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Never seen that one
> 
> > 037067a1 (Prohibit grace periods during early boot)
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Ditto
> 
> > 06ae115a (Avoid having just-onlined CPU resched itself when RCU is idle)
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Ditto
> 
> > 5b61b0ba (Wire up RCU_BOOST_PRIO for rcutree)
> > 
> > 	Symptoms: The system uses RT priority 1 for boosting regardless
> > 		of the value of CONFIG_RCU_BOOST_PRIO.
> 
> Makes sense
> 
> > e90c53d3 (Remove rcu_needs_cpu_flush() to avoid false quiescent states)
> > 
> > 	Symptoms: Too-short grace periods for RCU_FAST_NO_HZ=y.
> > 		But simpler to set RCU_FAST_NO_HZ=n.
> 
> We probably don't have that on

I hope not.  ;-)

> > And the following is a patch to a theoretical problem with expedited
> > grace periods.
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Avoid RCU-preempt expedited grace-period botch
> 
> Were you able to actually trigger that?

Nope, by inspection only.  Which is a good thing, as this would likely
be very hard to reproduce and even harder to debug.

							Thanx, Paul

> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 22:12   ` Peter Zijlstra
@ 2011-10-04 23:15     ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2011-10-04 23:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-rt-users, LKML

On Wed, Oct 05, 2011 at 12:12:35AM +0200, Peter Zijlstra wrote:
> On Wed, 2011-10-05 at 00:05 +0200, Thomas Gleixner wrote:
> > > peter_zijlstra-frob-rcu.patch
> > > 
> > >       Looks OK.  Hmmm...  Should this one go to mainline?
> > >       Oh, looks equivalent, actually.  So why the change?
> > 
> > Peter ? 
> 
> -       if (in_irq() || in_serving_softirq()) {
> +       if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {
> 
> 
> For !rt its equivalent yes, for rt otoh its not:
> 
> int in_serving_softirq(void)
> {
>         int res;
> 
>         preempt_disable();
>         res = __get_cpu_var(local_softirq_runner) == current;
>         preempt_enable();
>         return res;
> }
> 
> However invoke_softirq() will still add SOFTIRQ_OFFSET so we need to
> look at that to avoid recursion issues.
> 
> The changelog describes this. So this change is a direct consequence of
> -rt frobbing the softirq stuff and thus isn't needed upstream. 

Ah, thank you for the explanation!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 23:15   ` Paul E. McKenney
@ 2011-10-04 23:27     ` Thomas Gleixner
  2011-10-04 23:56       ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2011-10-04 23:27 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-rt-users, LKML, Peter Zijlstra

On Tue, 4 Oct 2011, Paul E. McKenney wrote:
> On Wed, Oct 05, 2011 at 12:05:47AM +0200, Thomas Gleixner wrote:
> > > 	This means that might_sleep() will never complain about
> > > 	blocking in an RCU read-side critical section.  I guess that
> > > 	this is necessary, though it would be better to have some
> > > 	way to complain for general sleeping (e.g., waiting on
> > > 	network receive) as opposed to blocking on a lock that is
> > > 	subject to priority inheritance.
> > 
> > Well, there's always a remaining problem. We need that stuff fully
> > preemptible on rt. Any ideas ?
> 
> Not yet.  We would have to classify context switches into two groups:
> 
> 1.	Preemptions or blocking waiting for sleeping spinlocks.
> 
> 2.	Everything else.
> 
> Given that classification, it would be straightforward: prohibit
> group #2 context switches while in RCU-preempt read-side critical
> sections.  I know, easy for me to say!  ;-)

Well, you know the preemtible regions of RT, it basically boils down
to #1 - except that it differs a bit from vanilla that locks and bh
stuff does not prevent preemption.

If RCU can deal with that, then #2 is a non issue :)
 
> > > rcu-disable-the-rcu-bh-stuff-for-rt.patch
> > > 
> > > 	This implements RCU-bh in terms of RCU-preempt, but disables
> > > 	BH around the resulting RCU-preempt read-side critical section.
> > > 	May be vulnerable to network-based denial-of-service attacks,
> > > 	which could OOM a system with this patch.
> > > 
> > > 	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
> > > 	an ideal world, it would also complain if not local_bh_disable().
> > 
> > Well, I dropped that after our IRC conversation, but we still need to
> > have some extra debugging for RT to diagnose situations where we break
> > those rcu_bh assumptions. That _bh rcu stuff should have never been
> > there, we'd rather should drop the softirq processing back to
> > ksoftirqd in such an overload case (in mainline) and voluntary
> > schedule away from ksoftirqd until the situation is resolved.
> > 
> > I consider rcu_bh a bandaid for the nasty implict semantics of BH
> > processing and I'm looking really forward to Peters analysis of the
> > modern cpu local BKL constructs at RTLWS.
> > 
> > The patch stemmed from an earlier discussion about getting rid of
> > those special rcu_bh semantics even in mainline for the sake of not
> > making a special case for a completely over(ab)used construct which
> > originates from the historically grown softirq behaviour. I think that
> > keeping the special cased rcu_bh stuff around is just taking any
> > incentive away from moving that whole softirq processing into a
> > scheduler controllable environment (i.e. threaded interrupts). 
> 
> Between -rt and the guys pushing packets, I can tell that this is going
> to be a fun one.  ;-)

We'll see. At some point they'll find out that a thread context will
make their life easier simply because the locking maze can be
distangled. That's why I'm ranting at the special "foster _bh" rcu
abominations, which keep them thinking that going down that road is
actually a good thing.
 
> I will see if I can come up with a way to make that patch safe to
> apply.  Might not be all that hard.

:)
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Quick review of -rt RCU-related patches
  2011-10-04 23:27     ` Thomas Gleixner
@ 2011-10-04 23:56       ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2011-10-04 23:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users, LKML, Peter Zijlstra

On Wed, Oct 05, 2011 at 01:27:15AM +0200, Thomas Gleixner wrote:
> On Tue, 4 Oct 2011, Paul E. McKenney wrote:
> > On Wed, Oct 05, 2011 at 12:05:47AM +0200, Thomas Gleixner wrote:
> > > > 	This means that might_sleep() will never complain about
> > > > 	blocking in an RCU read-side critical section.  I guess that
> > > > 	this is necessary, though it would be better to have some
> > > > 	way to complain for general sleeping (e.g., waiting on
> > > > 	network receive) as opposed to blocking on a lock that is
> > > > 	subject to priority inheritance.
> > > 
> > > Well, there's always a remaining problem. We need that stuff fully
> > > preemptible on rt. Any ideas ?
> > 
> > Not yet.  We would have to classify context switches into two groups:
> > 
> > 1.	Preemptions or blocking waiting for sleeping spinlocks.
> > 
> > 2.	Everything else.
> > 
> > Given that classification, it would be straightforward: prohibit
> > group #2 context switches while in RCU-preempt read-side critical
> > sections.  I know, easy for me to say!  ;-)
> 
> Well, you know the preemtible regions of RT, it basically boils down
> to #1 - except that it differs a bit from vanilla that locks and bh
> stuff does not prevent preemption.
> 
> If RCU can deal with that, then #2 is a non issue :)

Ah!  I thought you were asking for something that checked for people
violating -rt's rules for context switches within RCU-preempt read-side
critical sections.

> > > > rcu-disable-the-rcu-bh-stuff-for-rt.patch
> > > > 
> > > > 	This implements RCU-bh in terms of RCU-preempt, but disables
> > > > 	BH around the resulting RCU-preempt read-side critical section.
> > > > 	May be vulnerable to network-based denial-of-service attacks,
> > > > 	which could OOM a system with this patch.
> > > > 
> > > > 	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
> > > > 	an ideal world, it would also complain if not local_bh_disable().
> > > 
> > > Well, I dropped that after our IRC conversation, but we still need to
> > > have some extra debugging for RT to diagnose situations where we break
> > > those rcu_bh assumptions. That _bh rcu stuff should have never been
> > > there, we'd rather should drop the softirq processing back to
> > > ksoftirqd in such an overload case (in mainline) and voluntary
> > > schedule away from ksoftirqd until the situation is resolved.
> > > 
> > > I consider rcu_bh a bandaid for the nasty implict semantics of BH
> > > processing and I'm looking really forward to Peters analysis of the
> > > modern cpu local BKL constructs at RTLWS.
> > > 
> > > The patch stemmed from an earlier discussion about getting rid of
> > > those special rcu_bh semantics even in mainline for the sake of not
> > > making a special case for a completely over(ab)used construct which
> > > originates from the historically grown softirq behaviour. I think that
> > > keeping the special cased rcu_bh stuff around is just taking any
> > > incentive away from moving that whole softirq processing into a
> > > scheduler controllable environment (i.e. threaded interrupts). 
> > 
> > Between -rt and the guys pushing packets, I can tell that this is going
> > to be a fun one.  ;-)
> 
> We'll see. At some point they'll find out that a thread context will
> make their life easier simply because the locking maze can be
> distangled. That's why I'm ranting at the special "foster _bh" rcu
> abominations, which keep them thinking that going down that road is
> actually a good thing.
> 
> > I will see if I can come up with a way to make that patch safe to
> > apply.  Might not be all that hard.
> 
> :)

Easier now that "git clone" works for me once more.  Re-installed my
system, but my notes from doing it last time were incomplete.  They
are better now.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-04 23:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-04 17:47 Quick review of -rt RCU-related patches Paul E. McKenney
2011-10-04 22:05 ` Thomas Gleixner
2011-10-04 22:12   ` Peter Zijlstra
2011-10-04 23:15     ` Paul E. McKenney
2011-10-04 22:13   ` Peter Zijlstra
2011-10-04 23:15   ` Paul E. McKenney
2011-10-04 23:27     ` Thomas Gleixner
2011-10-04 23:56       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.