linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
@ 2018-10-14 21:29 Joel Fernandes (Google)
  2018-10-14 23:17 ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes (Google) @ 2018-10-14 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Jonathan Corbet, Josh Triplett, Lai Jiangshan, linux-doc,
	Mathieu Desnoyers, Paul E. McKenney, Steven Rostedt

The Requirements.html document says "Disabling Preemption Does Not Block
Grace Periods". However this is no longer true with the RCU
consolidation. Lets remove the obsolete (non-)requirement entirely.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 .../RCU/Design/Requirements/Requirements.html | 50 -------------------
 1 file changed, 50 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 7efc1c1da7af..4fae55056c1d 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -900,8 +900,6 @@ Except where otherwise noted, these non-guarantees were premeditated.
 	Grace Periods Don't Partition Read-Side Critical Sections</a>
 <li>	<a href="#Read-Side Critical Sections Don't Partition Grace Periods">
 	Read-Side Critical Sections Don't Partition Grace Periods</a>
-<li>	<a href="#Disabling Preemption Does Not Block Grace Periods">
-	Disabling Preemption Does Not Block Grace Periods</a>
 </ol>
 
 <h3><a name="Readers Impose Minimal Ordering">Readers Impose Minimal Ordering</a></h3>
@@ -1259,54 +1257,6 @@ of RCU grace periods.
 <tr><td>&nbsp;</td></tr>
 </table>
 
-<h3><a name="Disabling Preemption Does Not Block Grace Periods">
-Disabling Preemption Does Not Block Grace Periods</a></h3>
-
-<p>
-There was a time when disabling preemption on any given CPU would block
-subsequent grace periods.
-However, this was an accident of implementation and is not a requirement.
-And in the current Linux-kernel implementation, disabling preemption
-on a given CPU in fact does not block grace periods, as Oleg Nesterov
-<a href="https://lkml.kernel.org/g/20150614193825.GA19582@redhat.com">demonstrated</a>.
-
-<p>
-If you need a preempt-disable region to block grace periods, you need to add
-<tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>, for example
-as follows:
-
-<blockquote>
-<pre>
- 1 preempt_disable();
- 2 rcu_read_lock();
- 3 do_something();
- 4 rcu_read_unlock();
- 5 preempt_enable();
- 6
- 7 /* Spinlocks implicitly disable preemption. */
- 8 spin_lock(&amp;mylock);
- 9 rcu_read_lock();
-10 do_something();
-11 rcu_read_unlock();
-12 spin_unlock(&amp;mylock);
-</pre>
-</blockquote>
-
-<p>
-In theory, you could enter the RCU read-side critical section first,
-but it is more efficient to keep the entire RCU read-side critical
-section contained in the preempt-disable region as shown above.
-Of course, RCU read-side critical sections that extend outside of
-preempt-disable regions will work correctly, but such critical sections
-can be preempted, which forces <tt>rcu_read_unlock()</tt> to do
-more work.
-And no, this is <i>not</i> an invitation to enclose all of your RCU
-read-side critical sections within preempt-disable regions, because
-doing so would degrade real-time response.
-
-<p>
-This non-requirement appeared with preemptible RCU.
-
 <h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2>
 
 <p>
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-14 21:29 [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption Joel Fernandes (Google)
@ 2018-10-14 23:17 ` Paul E. McKenney
  2018-10-15  2:08   ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-14 23:17 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> The Requirements.html document says "Disabling Preemption Does Not Block
> Grace Periods". However this is no longer true with the RCU
> consolidation. Lets remove the obsolete (non-)requirement entirely.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Good catch, queued, thank you!

							Thanx, Paul

> ---
>  .../RCU/Design/Requirements/Requirements.html | 50 -------------------
>  1 file changed, 50 deletions(-)
> 
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
> index 7efc1c1da7af..4fae55056c1d 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.html
> +++ b/Documentation/RCU/Design/Requirements/Requirements.html
> @@ -900,8 +900,6 @@ Except where otherwise noted, these non-guarantees were premeditated.
>  	Grace Periods Don't Partition Read-Side Critical Sections</a>
>  <li>	<a href="#Read-Side Critical Sections Don't Partition Grace Periods">
>  	Read-Side Critical Sections Don't Partition Grace Periods</a>
> -<li>	<a href="#Disabling Preemption Does Not Block Grace Periods">
> -	Disabling Preemption Does Not Block Grace Periods</a>
>  </ol>
>  
>  <h3><a name="Readers Impose Minimal Ordering">Readers Impose Minimal Ordering</a></h3>
> @@ -1259,54 +1257,6 @@ of RCU grace periods.
>  <tr><td>&nbsp;</td></tr>
>  </table>
>  
> -<h3><a name="Disabling Preemption Does Not Block Grace Periods">
> -Disabling Preemption Does Not Block Grace Periods</a></h3>
> -
> -<p>
> -There was a time when disabling preemption on any given CPU would block
> -subsequent grace periods.
> -However, this was an accident of implementation and is not a requirement.
> -And in the current Linux-kernel implementation, disabling preemption
> -on a given CPU in fact does not block grace periods, as Oleg Nesterov
> -<a href="https://lkml.kernel.org/g/20150614193825.GA19582@redhat.com">demonstrated</a>.
> -
> -<p>
> -If you need a preempt-disable region to block grace periods, you need to add
> -<tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>, for example
> -as follows:
> -
> -<blockquote>
> -<pre>
> - 1 preempt_disable();
> - 2 rcu_read_lock();
> - 3 do_something();
> - 4 rcu_read_unlock();
> - 5 preempt_enable();
> - 6
> - 7 /* Spinlocks implicitly disable preemption. */
> - 8 spin_lock(&amp;mylock);
> - 9 rcu_read_lock();
> -10 do_something();
> -11 rcu_read_unlock();
> -12 spin_unlock(&amp;mylock);
> -</pre>
> -</blockquote>
> -
> -<p>
> -In theory, you could enter the RCU read-side critical section first,
> -but it is more efficient to keep the entire RCU read-side critical
> -section contained in the preempt-disable region as shown above.
> -Of course, RCU read-side critical sections that extend outside of
> -preempt-disable regions will work correctly, but such critical sections
> -can be preempted, which forces <tt>rcu_read_unlock()</tt> to do
> -more work.
> -And no, this is <i>not</i> an invitation to enclose all of your RCU
> -read-side critical sections within preempt-disable regions, because
> -doing so would degrade real-time response.
> -
> -<p>
> -This non-requirement appeared with preemptible RCU.
> -
>  <h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2>
>  
>  <p>
> -- 
> 2.19.0.605.g01d371f741-goog
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-14 23:17 ` Paul E. McKenney
@ 2018-10-15  2:08   ` Joel Fernandes
  2018-10-15  2:13     ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15  2:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > The Requirements.html document says "Disabling Preemption Does Not Block
> > Grace Periods". However this is no longer true with the RCU
> > consolidation. Lets remove the obsolete (non-)requirement entirely.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Good catch, queued, thank you!

Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
confirm that this is indeed obsolete.  :)

One thing interesting came up when I tried synchronize_rcu_expedited()
instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
millisecond delay between when the rcu read section completely and the
synchronize_rcu_expedited returns:

For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
done' are about 3 millisecond apart:
[   77.599142] SPIN start
[   77.601595] SYNC start
[   82.604950] SPIN done!
[   82.607836] SYNC done!
 I saw anywhere from 2-6 milliseconds.

The reason I bring this up is according to Requirements.html: In some cases,
the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
something up in the experiment, or I need to update this part of the document ;-)

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  2:08   ` Joel Fernandes
@ 2018-10-15  2:13     ` Joel Fernandes
  2018-10-15  2:33       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15  2:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > > The Requirements.html document says "Disabling Preemption Does Not Block
> > > Grace Periods". However this is no longer true with the RCU
> > > consolidation. Lets remove the obsolete (non-)requirement entirely.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Good catch, queued, thank you!
> 
> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> confirm that this is indeed obsolete.  :)
> 
> One thing interesting came up when I tried synchronize_rcu_expedited()
> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> millisecond delay between when the rcu read section completely and the
> synchronize_rcu_expedited returns:
> 
> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> done' are about 3 millisecond apart:
> [   77.599142] SPIN start
> [   77.601595] SYNC start
> [   82.604950] SPIN done!
> [   82.607836] SYNC done!
>  I saw anywhere from 2-6 milliseconds.
> 
> The reason I bring this up is according to Requirements.html: In some cases,
> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> something up in the experiment, or I need to update this part of the document ;-)

So I realized I'm running in Qemu so it could also be a scheduling delay of
the vcpu thread. So apologies about the noise if the experiment works fine
for you.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  2:13     ` Joel Fernandes
@ 2018-10-15  2:33       ` Paul E. McKenney
  2018-10-15  2:47         ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-15  2:33 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > > On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > > > The Requirements.html document says "Disabling Preemption Does Not Block
> > > > Grace Periods". However this is no longer true with the RCU
> > > > consolidation. Lets remove the obsolete (non-)requirement entirely.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Good catch, queued, thank you!
> > 
> > Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > confirm that this is indeed obsolete.  :)
> > 
> > One thing interesting came up when I tried synchronize_rcu_expedited()
> > instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > millisecond delay between when the rcu read section completely and the
> > synchronize_rcu_expedited returns:
> > 
> > For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > done' are about 3 millisecond apart:
> > [   77.599142] SPIN start
> > [   77.601595] SYNC start
> > [   82.604950] SPIN done!
> > [   82.607836] SYNC done!
> >  I saw anywhere from 2-6 milliseconds.
> > 
> > The reason I bring this up is according to Requirements.html: In some cases,
> > the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > something up in the experiment, or I need to update this part of the document ;-)

In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
you using?  (My guess is CONFIG_PREEMPT=y.)

> So I realized I'm running in Qemu so it could also be a scheduling delay of
> the vcpu thread. So apologies about the noise if the experiment works fine
> for you.

I used rcuperf, which might not be doing the same thing as Oleg's
experiment.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  2:33       ` Paul E. McKenney
@ 2018-10-15  2:47         ` Joel Fernandes
  2018-10-15  2:50           ` Joel Fernandes
  2018-10-15  6:05           ` Nikolay Borisov
  0 siblings, 2 replies; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15  2:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> > On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > > On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > > > > The Requirements.html document says "Disabling Preemption Does Not Block
> > > > > Grace Periods". However this is no longer true with the RCU
> > > > > consolidation. Lets remove the obsolete (non-)requirement entirely.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Good catch, queued, thank you!
> > > 
> > > Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > > confirm that this is indeed obsolete.  :)
> > > 
> > > One thing interesting came up when I tried synchronize_rcu_expedited()
> > > instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > > millisecond delay between when the rcu read section completely and the
> > > synchronize_rcu_expedited returns:
> > > 
> > > For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > > done' are about 3 millisecond apart:
> > > [   77.599142] SPIN start
> > > [   77.601595] SYNC start
> > > [   82.604950] SPIN done!
> > > [   82.607836] SYNC done!
> > >  I saw anywhere from 2-6 milliseconds.
> > > 
> > > The reason I bring this up is according to Requirements.html: In some cases,
> > > the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > > cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > > something up in the experiment, or I need to update this part of the document ;-)
> 
> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> you using?  (My guess is CONFIG_PREEMPT=y.)

The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.

> > So I realized I'm running in Qemu so it could also be a scheduling delay of
> > the vcpu thread. So apologies about the noise if the experiment works fine
> > for you.
> 
> I used rcuperf, which might not be doing the same thing as Oleg's
> experiment.

The experiment is mentioned at:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html

If you apply the below diff, it applies cleanly on rcu/dev. And then run:
taskset 2 perl -e 'syscall 157, 666, 5000' &
taskset 1 perl -e 'syscall 157, 777'

diff --git a/kernel/sys.c b/kernel/sys.c
index cf5c67533ff1..b654b7566ca3 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
 	return -EINVAL;
 }
 
+#include <linux/delay.h>
+
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 
 	error = 0;
 	switch (option) {
+	case 666:
+		preempt_disable();
+		pr_crit("SPIN start\n");
+		while (arg2--)
+			mdelay(1);
+		pr_crit("SPIN done!\n");
+		preempt_enable();
+		break;
+	case 777:
+		pr_crit("SYNC start\n");
+		synchronize_rcu();
+		pr_crit("SYNC done!\n");
+		break;
 	case PR_SET_PDEATHSIG:
 		if (!valid_signal(arg2)) {
 			error = -EINVAL;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  2:47         ` Joel Fernandes
@ 2018-10-15  2:50           ` Joel Fernandes
  2018-10-15  6:05           ` Nikolay Borisov
  1 sibling, 0 replies; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15  2:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt

On Sun, Oct 14, 2018 at 07:47:58PM -0700, Joel Fernandes wrote:
> On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> > On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> > > On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > > > On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > > > > > The Requirements.html document says "Disabling Preemption Does Not Block
> > > > > > Grace Periods". However this is no longer true with the RCU
> > > > > > consolidation. Lets remove the obsolete (non-)requirement entirely.
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > 
> > > > > Good catch, queued, thank you!
> > > > 
> > > > Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > > > confirm that this is indeed obsolete.  :)
> > > > 
> > > > One thing interesting came up when I tried synchronize_rcu_expedited()
> > > > instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > > > millisecond delay between when the rcu read section completely and the
> > > > synchronize_rcu_expedited returns:
> > > > 
> > > > For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > > > done' are about 3 millisecond apart:
> > > > [   77.599142] SPIN start
> > > > [   77.601595] SYNC start
> > > > [   82.604950] SPIN done!
> > > > [   82.607836] SYNC done!
> > > >  I saw anywhere from 2-6 milliseconds.
> > > > 
> > > > The reason I bring this up is according to Requirements.html: In some cases,
> > > > the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > > > cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > > > something up in the experiment, or I need to update this part of the document ;-)
> > 
> > In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> > point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> > you using?  (My guess is CONFIG_PREEMPT=y.)
> 
> The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> 
> > > So I realized I'm running in Qemu so it could also be a scheduling delay of
> > > the vcpu thread. So apologies about the noise if the experiment works fine
> > > for you.
> > 
> > I used rcuperf, which might not be doing the same thing as Oleg's
> > experiment.
> 
> The experiment is mentioned at:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> 
> If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> taskset 2 perl -e 'syscall 157, 666, 5000' &
> taskset 1 perl -e 'syscall 157, 777'
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index cf5c67533ff1..b654b7566ca3 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>  	return -EINVAL;
>  }
>  
> +#include <linux/delay.h>
> +
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  
>  	error = 0;
>  	switch (option) {
> +	case 666:
> +		preempt_disable();
> +		pr_crit("SPIN start\n");
> +		while (arg2--)
> +			mdelay(1);
> +		pr_crit("SPIN done!\n");
> +		preempt_enable();
> +		break;
> +	case 777:
> +		pr_crit("SYNC start\n");
> +		synchronize_rcu();

By the way, just to mention, this needs to be changed to
synchronize_rcu_expedited() and then built. So with that I see the multi
millesecond delay.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  2:47         ` Joel Fernandes
  2018-10-15  2:50           ` Joel Fernandes
@ 2018-10-15  6:05           ` Nikolay Borisov
  2018-10-15 11:21             ` Paul E. McKenney
  1 sibling, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2018-10-15  6:05 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney
  Cc: linux-kernel, Jonathan Corbet, Josh Triplett, Lai Jiangshan,
	linux-doc, Mathieu Desnoyers, Steven Rostedt



On 15.10.2018 05:47, Joel Fernandes wrote:
> On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
>> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
>>> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
>>>> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
>>>>> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
>>>>>> The Requirements.html document says "Disabling Preemption Does Not Block
>>>>>> Grace Periods". However this is no longer true with the RCU
>>>>>> consolidation. Lets remove the obsolete (non-)requirement entirely.
>>>>>>
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>
>>>>> Good catch, queued, thank you!
>>>>
>>>> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
>>>> confirm that this is indeed obsolete.  :)
>>>>
>>>> One thing interesting came up when I tried synchronize_rcu_expedited()
>>>> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
>>>> millisecond delay between when the rcu read section completely and the
>>>> synchronize_rcu_expedited returns:
>>>>
>>>> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
>>>> done' are about 3 millisecond apart:
>>>> [   77.599142] SPIN start
>>>> [   77.601595] SYNC start
>>>> [   82.604950] SPIN done!
>>>> [   82.607836] SYNC done!
>>>>  I saw anywhere from 2-6 milliseconds.
>>>>
>>>> The reason I bring this up is according to Requirements.html: In some cases,
>>>> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
>>>> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
>>>> something up in the experiment, or I need to update this part of the document ;-)
>>
>> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
>> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
>> you using?  (My guess is CONFIG_PREEMPT=y.)
> 
> The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> 
>>> So I realized I'm running in Qemu so it could also be a scheduling delay of
>>> the vcpu thread. So apologies about the noise if the experiment works fine
>>> for you.
>>
>> I used rcuperf, which might not be doing the same thing as Oleg's
>> experiment.
> 
> The experiment is mentioned at:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> 
> If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> taskset 2 perl -e 'syscall 157, 666, 5000' &
> taskset 1 perl -e 'syscall 157, 777'
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index cf5c67533ff1..b654b7566ca3 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>  	return -EINVAL;
>  }
>  
> +#include <linux/delay.h>
> +
> +
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
>  {
> @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  
>  	error = 0;
>  	switch (option) {
> +	case 666:
> +		preempt_disable();
> +		pr_crit("SPIN start\n");
> +		while (arg2--)
> +			mdelay(1);
> +		pr_crit("SPIN done!\n");
> +		preempt_enable();
> +		break;
> +	case 777:
> +		pr_crit("SYNC start\n");
> +		synchronize_rcu();
> +		pr_crit("SYNC done!\n");

But you are using the console printing infrastructure which is rather
heavyweight. Try replacing pr_* calls with trace_printk so that you
write to the lock-free ring buffer, this will reduce the noise from the
heavy console printing infrastructure.


> +		break;
>  	case PR_SET_PDEATHSIG:
>  		if (!valid_signal(arg2)) {
>  			error = -EINVAL;
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15  6:05           ` Nikolay Borisov
@ 2018-10-15 11:21             ` Paul E. McKenney
  2018-10-15 19:39               ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-15 11:21 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Joel Fernandes, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 09:05:22AM +0300, Nikolay Borisov wrote:
> On 15.10.2018 05:47, Joel Fernandes wrote:
> > On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> >> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> >>> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> >>>> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> >>>>> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> >>>>>> The Requirements.html document says "Disabling Preemption Does Not Block
> >>>>>> Grace Periods". However this is no longer true with the RCU
> >>>>>> consolidation. Lets remove the obsolete (non-)requirement entirely.
> >>>>>>
> >>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >>>>>
> >>>>> Good catch, queued, thank you!
> >>>>
> >>>> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> >>>> confirm that this is indeed obsolete.  :)
> >>>>
> >>>> One thing interesting came up when I tried synchronize_rcu_expedited()
> >>>> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> >>>> millisecond delay between when the rcu read section completely and the
> >>>> synchronize_rcu_expedited returns:
> >>>>
> >>>> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> >>>> done' are about 3 millisecond apart:
> >>>> [   77.599142] SPIN start
> >>>> [   77.601595] SYNC start
> >>>> [   82.604950] SPIN done!
> >>>> [   82.607836] SYNC done!
> >>>>  I saw anywhere from 2-6 milliseconds.
> >>>>
> >>>> The reason I bring this up is according to Requirements.html: In some cases,
> >>>> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> >>>> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> >>>> something up in the experiment, or I need to update this part of the document ;-)
> >>
> >> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> >> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> >> you using?  (My guess is CONFIG_PREEMPT=y.)
> > 
> > The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> > 
> >>> So I realized I'm running in Qemu so it could also be a scheduling delay of
> >>> the vcpu thread. So apologies about the noise if the experiment works fine
> >>> for you.
> >>
> >> I used rcuperf, which might not be doing the same thing as Oleg's
> >> experiment.
> > 
> > The experiment is mentioned at:
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> > 
> > If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> > taskset 2 perl -e 'syscall 157, 666, 5000' &
> > taskset 1 perl -e 'syscall 157, 777'
> > 
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index cf5c67533ff1..b654b7566ca3 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
> >  	return -EINVAL;
> >  }
> >  
> > +#include <linux/delay.h>
> > +
> > +
> >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  		unsigned long, arg4, unsigned long, arg5)
> >  {
> > @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >  
> >  	error = 0;
> >  	switch (option) {
> > +	case 666:
> > +		preempt_disable();
> > +		pr_crit("SPIN start\n");
> > +		while (arg2--)
> > +			mdelay(1);

OK, this is the problem.  When you spin in the kernel for several
milliseconds with preemption disabled, the consolidated grace period
is -required- to wait for this preemption-disabled reader to complete,
whether expedited or not.

So, expected behavior.  ;-)

In any case, please don't spin for milliseconds with preemption disabled.
The real-time guys are unlikely to be happy with you if you do this!

> > +		pr_crit("SPIN done!\n");
> > +		preempt_enable();
> > +		break;
> > +	case 777:
> > +		pr_crit("SYNC start\n");
> > +		synchronize_rcu();
> > +		pr_crit("SYNC done!\n");
> 
> But you are using the console printing infrastructure which is rather
> heavyweight. Try replacing pr_* calls with trace_printk so that you
> write to the lock-free ring buffer, this will reduce the noise from the
> heavy console printing infrastructure.

And this might be a problem as well.

							Thanx, Paul

> > +		break;
> >  	case PR_SET_PDEATHSIG:
> >  		if (!valid_signal(arg2)) {
> >  			error = -EINVAL;
> > 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15 11:21             ` Paul E. McKenney
@ 2018-10-15 19:39               ` Joel Fernandes
  2018-10-15 19:54                 ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15 19:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 04:21:12AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 15, 2018 at 09:05:22AM +0300, Nikolay Borisov wrote:
> > On 15.10.2018 05:47, Joel Fernandes wrote:
> > > On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> > >> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> > >>> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > >>>> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > >>>>> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > >>>>>> The Requirements.html document says "Disabling Preemption Does Not Block
> > >>>>>> Grace Periods". However this is no longer true with the RCU
> > >>>>>> consolidation. Lets remove the obsolete (non-)requirement entirely.
> > >>>>>>
> > >>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > >>>>>
> > >>>>> Good catch, queued, thank you!
> > >>>>
> > >>>> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > >>>> confirm that this is indeed obsolete.  :)
> > >>>>
> > >>>> One thing interesting came up when I tried synchronize_rcu_expedited()
> > >>>> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > >>>> millisecond delay between when the rcu read section completely and the
> > >>>> synchronize_rcu_expedited returns:
> > >>>>
> > >>>> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > >>>> done' are about 3 millisecond apart:
> > >>>> [   77.599142] SPIN start
> > >>>> [   77.601595] SYNC start
> > >>>> [   82.604950] SPIN done!
> > >>>> [   82.607836] SYNC done!
> > >>>>  I saw anywhere from 2-6 milliseconds.
> > >>>>
> > >>>> The reason I bring this up is according to Requirements.html: In some cases,
> > >>>> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > >>>> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > >>>> something up in the experiment, or I need to update this part of the document ;-)
> > >>
> > >> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> > >> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> > >> you using?  (My guess is CONFIG_PREEMPT=y.)
> > > 
> > > The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> > > 
> > >>> So I realized I'm running in Qemu so it could also be a scheduling delay of
> > >>> the vcpu thread. So apologies about the noise if the experiment works fine
> > >>> for you.
> > >>
> > >> I used rcuperf, which might not be doing the same thing as Oleg's
> > >> experiment.
> > > 
> > > The experiment is mentioned at:
> > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> > > 
> > > If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> > > taskset 2 perl -e 'syscall 157, 666, 5000' &
> > > taskset 1 perl -e 'syscall 157, 777'
> > > 
> > > diff --git a/kernel/sys.c b/kernel/sys.c
> > > index cf5c67533ff1..b654b7566ca3 100644
> > > --- a/kernel/sys.c
> > > +++ b/kernel/sys.c
> > > @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
> > >  	return -EINVAL;
> > >  }
> > >  
> > > +#include <linux/delay.h>
> > > +
> > > +
> > >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > >  		unsigned long, arg4, unsigned long, arg5)
> > >  {
> > > @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > >  
> > >  	error = 0;
> > >  	switch (option) {
> > > +	case 666:
> > > +		preempt_disable();
> > > +		pr_crit("SPIN start\n");
> > > +		while (arg2--)
> > > +			mdelay(1);
> 
> OK, this is the problem.  When you spin in the kernel for several
> milliseconds with preemption disabled, the consolidated grace period
> is -required- to wait for this preemption-disabled reader to complete,
> whether expedited or not.
> 
> So, expected behavior.  ;-)

Cool. Thanks for confirming. I ran some tests too and if I reduce the
preempt_disabled section's duration, then the delay for
synchronize_rcu_expedited is much lesser indeed.

> In any case, please don't spin for milliseconds with preemption disabled.
> The real-time guys are unlikely to be happy with you if you do this!

Well just to clarify, I was just running Oleg's test which did this. This
test was mentioned in the original documentation that I deleted. Ofcourse I
would not dare do such a thing in production code :-D. I guess to Oleg's
defense, he did it to very that synchronize_rcu() was not blocked on
preempt-disable sections which was a different test.

> > > +		pr_crit("SPIN done!\n");
> > > +		preempt_enable();
> > > +		break;
> > > +	case 777:
> > > +		pr_crit("SYNC start\n");
> > > +		synchronize_rcu();
> > > +		pr_crit("SYNC done!\n");
> > 
> > But you are using the console printing infrastructure which is rather
> > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > write to the lock-free ring buffer, this will reduce the noise from the
> > heavy console printing infrastructure.
> 
> And this might be a problem as well.

This was not the issue (or atleast not fully the issue) since I saw the same
thing with trace_printk. It was exactly what you said - which is the
excessively long preempt disabled times.

thanks!

 - Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15 19:39               ` Joel Fernandes
@ 2018-10-15 19:54                 ` Paul E. McKenney
  2018-10-15 20:15                   ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-15 19:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 12:39:51PM -0700, Joel Fernandes wrote:
> On Mon, Oct 15, 2018 at 04:21:12AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 15, 2018 at 09:05:22AM +0300, Nikolay Borisov wrote:
> > > On 15.10.2018 05:47, Joel Fernandes wrote:
> > > > On Sun, Oct 14, 2018 at 07:33:28PM -0700, Paul E. McKenney wrote:
> > > >> On Sun, Oct 14, 2018 at 07:13:49PM -0700, Joel Fernandes wrote:
> > > >>> On Sun, Oct 14, 2018 at 07:08:27PM -0700, Joel Fernandes wrote:
> > > >>>> On Sun, Oct 14, 2018 at 04:17:31PM -0700, Paul E. McKenney wrote:
> > > >>>>> On Sun, Oct 14, 2018 at 02:29:55PM -0700, Joel Fernandes (Google) wrote:
> > > >>>>>> The Requirements.html document says "Disabling Preemption Does Not Block
> > > >>>>>> Grace Periods". However this is no longer true with the RCU
> > > >>>>>> consolidation. Lets remove the obsolete (non-)requirement entirely.
> > > >>>>>>
> > > >>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > >>>>>
> > > >>>>> Good catch, queued, thank you!
> > > >>>>
> > > >>>> Thanks! By the way after I sent the patch, I also tried Oleg's experiment to
> > > >>>> confirm that this is indeed obsolete.  :)
> > > >>>>
> > > >>>> One thing interesting came up when I tried synchronize_rcu_expedited()
> > > >>>> instead of synchronize_rcu() in Oleg's experiment, I still saw a multiple
> > > >>>> millisecond delay between when the rcu read section completely and the
> > > >>>> synchronize_rcu_expedited returns:
> > > >>>>
> > > >>>> For example, with synchronize_rcu_expedited, the 'SPIN done' and the 'SYNC
> > > >>>> done' are about 3 millisecond apart:
> > > >>>> [   77.599142] SPIN start
> > > >>>> [   77.601595] SYNC start
> > > >>>> [   82.604950] SPIN done!
> > > >>>> [   82.607836] SYNC done!
> > > >>>>  I saw anywhere from 2-6 milliseconds.
> > > >>>>
> > > >>>> The reason I bring this up is according to Requirements.html: In some cases,
> > > >>>> the multi-millisecond synchronize_rcu() latencies are unacceptable. In these
> > > >>>> cases, synchronize_rcu_expedited() may be used instead,.. so either I messed
> > > >>>> something up in the experiment, or I need to update this part of the document ;-)
> > > >>
> > > >> In normal testing, 2-6 milliseconds is indeed excessive.  Could you please
> > > >> point me at Oleg's experiment?  Also, what CONFIG_PREEMPT setting were
> > > >> you using?  (My guess is CONFIG_PREEMPT=y.)
> > > > 
> > > > The CONFIG_PREEMPT config I am using is CONFIG_PREEMPT=y.
> > > > 
> > > >>> So I realized I'm running in Qemu so it could also be a scheduling delay of
> > > >>> the vcpu thread. So apologies about the noise if the experiment works fine
> > > >>> for you.
> > > >>
> > > >> I used rcuperf, which might not be doing the same thing as Oleg's
> > > >> experiment.
> > > > 
> > > > The experiment is mentioned at:
> > > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg912055.html
> > > > 
> > > > If you apply the below diff, it applies cleanly on rcu/dev. And then run:
> > > > taskset 2 perl -e 'syscall 157, 666, 5000' &
> > > > taskset 1 perl -e 'syscall 157, 777'
> > > > 
> > > > diff --git a/kernel/sys.c b/kernel/sys.c
> > > > index cf5c67533ff1..b654b7566ca3 100644
> > > > --- a/kernel/sys.c
> > > > +++ b/kernel/sys.c
> > > > @@ -2261,6 +2261,9 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
> > > >  	return -EINVAL;
> > > >  }
> > > >  
> > > > +#include <linux/delay.h>
> > > > +
> > > > +
> > > >  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > > >  		unsigned long, arg4, unsigned long, arg5)
> > > >  {
> > > > @@ -2274,6 +2277,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > > >  
> > > >  	error = 0;
> > > >  	switch (option) {
> > > > +	case 666:
> > > > +		preempt_disable();
> > > > +		pr_crit("SPIN start\n");
> > > > +		while (arg2--)
> > > > +			mdelay(1);
> > 
> > OK, this is the problem.  When you spin in the kernel for several
> > milliseconds with preemption disabled, the consolidated grace period
> > is -required- to wait for this preemption-disabled reader to complete,
> > whether expedited or not.
> > 
> > So, expected behavior.  ;-)
> 
> Cool. Thanks for confirming. I ran some tests too and if I reduce the
> preempt_disabled section's duration, then the delay for
> synchronize_rcu_expedited is much lesser indeed.

Good to hear!

> > In any case, please don't spin for milliseconds with preemption disabled.
> > The real-time guys are unlikely to be happy with you if you do this!
> 
> Well just to clarify, I was just running Oleg's test which did this. This
> test was mentioned in the original documentation that I deleted. Ofcourse I
> would not dare do such a thing in production code :-D. I guess to Oleg's
> defense, he did it to very that synchronize_rcu() was not blocked on
> preempt-disable sections which was a different test.

Understood!  Just pointing out that RCU's tolerating a given action does
not necessarily mean that it is a good idea to take that action.  ;-)

> > > > +		pr_crit("SPIN done!\n");
> > > > +		preempt_enable();
> > > > +		break;
> > > > +	case 777:
> > > > +		pr_crit("SYNC start\n");
> > > > +		synchronize_rcu();
> > > > +		pr_crit("SYNC done!\n");
> > > 
> > > But you are using the console printing infrastructure which is rather
> > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > write to the lock-free ring buffer, this will reduce the noise from the
> > > heavy console printing infrastructure.
> > 
> > And this might be a problem as well.
> 
> This was not the issue (or atleast not fully the issue) since I saw the same
> thing with trace_printk. It was exactly what you said - which is the
> excessively long preempt disabled times.

One approach would be to apply this patch against (say) v4.18, which
does not have consolidated grace periods.  You might then be able to
tell if the pr_crit() calls make any difference.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15 19:54                 ` Paul E. McKenney
@ 2018-10-15 20:15                   ` Joel Fernandes
  2018-10-15 21:08                     ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-15 20:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
[...]
> > > In any case, please don't spin for milliseconds with preemption disabled.
> > > The real-time guys are unlikely to be happy with you if you do this!
> > 
> > Well just to clarify, I was just running Oleg's test which did this. This
> > test was mentioned in the original documentation that I deleted. Ofcourse I
> > would not dare do such a thing in production code :-D. I guess to Oleg's
> > defense, he did it to very that synchronize_rcu() was not blocked on
> > preempt-disable sections which was a different test.
> 
> Understood!  Just pointing out that RCU's tolerating a given action does
> not necessarily mean that it is a good idea to take that action.  ;-)

Makes sense :-) thanks.

> > > > > +		pr_crit("SPIN done!\n");
> > > > > +		preempt_enable();
> > > > > +		break;
> > > > > +	case 777:
> > > > > +		pr_crit("SYNC start\n");
> > > > > +		synchronize_rcu();
> > > > > +		pr_crit("SYNC done!\n");
> > > > 
> > > > But you are using the console printing infrastructure which is rather
> > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > heavy console printing infrastructure.
> > > 
> > > And this might be a problem as well.
> > 
> > This was not the issue (or atleast not fully the issue) since I saw the same
> > thing with trace_printk. It was exactly what you said - which is the
> > excessively long preempt disabled times.
> 
> One approach would be to apply this patch against (say) v4.18, which
> does not have consolidated grace periods.  You might then be able to
> tell if the pr_crit() calls make any difference.

I could do that, yeah. But since the original problem went away due to
disabling preempts for a short while, I will move on and continue to focus on
updating other parts of the documenation. Just to mention I
brought this up because I thought its better to do that than not to, just
incase there is any lurking issue with the consolidation. Sorry if that ended
up with me being noisy.

Just curious, while I am going through the documentation, is there anything
in particular that particularly sticks out to you that needs updating? I
think I am around 50% there with the last several rounds of doc patches but I
have lot more to go through. "Just keep doing what you're doing" is also a
perfectly valid answer ;-)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15 20:15                   ` Joel Fernandes
@ 2018-10-15 21:08                     ` Paul E. McKenney
  2018-10-16 11:26                       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-15 21:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> [...]
> > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > The real-time guys are unlikely to be happy with you if you do this!
> > > 
> > > Well just to clarify, I was just running Oleg's test which did this. This
> > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > preempt-disable sections which was a different test.
> > 
> > Understood!  Just pointing out that RCU's tolerating a given action does
> > not necessarily mean that it is a good idea to take that action.  ;-)
> 
> Makes sense :-) thanks.

Don't worry, that won't happen again.  ;-)

> > > > > > +		pr_crit("SPIN done!\n");
> > > > > > +		preempt_enable();
> > > > > > +		break;
> > > > > > +	case 777:
> > > > > > +		pr_crit("SYNC start\n");
> > > > > > +		synchronize_rcu();
> > > > > > +		pr_crit("SYNC done!\n");
> > > > > 
> > > > > But you are using the console printing infrastructure which is rather
> > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > heavy console printing infrastructure.
> > > > 
> > > > And this might be a problem as well.
> > > 
> > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > thing with trace_printk. It was exactly what you said - which is the
> > > excessively long preempt disabled times.
> > 
> > One approach would be to apply this patch against (say) v4.18, which
> > does not have consolidated grace periods.  You might then be able to
> > tell if the pr_crit() calls make any difference.
> 
> I could do that, yeah. But since the original problem went away due to
> disabling preempts for a short while, I will move on and continue to focus on
> updating other parts of the documenation. Just to mention I
> brought this up because I thought its better to do that than not to, just
> incase there is any lurking issue with the consolidation. Sorry if that ended
> up with me being noisy.

Not a problem, no need to apologize!

> Just curious, while I am going through the documentation, is there anything
> in particular that particularly sticks out to you that needs updating? I
> think I am around 50% there with the last several rounds of doc patches but I
> have lot more to go through. "Just keep doing what you're doing" is also a
> perfectly valid answer ;-)

It is the things needing updating that I do not yet know about that worry
the most, so "Just keep doing what you're doing" seems most appropriate.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-15 21:08                     ` Paul E. McKenney
@ 2018-10-16 11:26                       ` Paul E. McKenney
  2018-10-16 20:41                         ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-16 11:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > 
> > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > preempt-disable sections which was a different test.
> > > 
> > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > not necessarily mean that it is a good idea to take that action.  ;-)
> > 
> > Makes sense :-) thanks.
> 
> Don't worry, that won't happen again.  ;-)
> 
> > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > +		preempt_enable();
> > > > > > > +		break;
> > > > > > > +	case 777:
> > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > +		synchronize_rcu();
> > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > 
> > > > > > But you are using the console printing infrastructure which is rather
> > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > heavy console printing infrastructure.
> > > > > 
> > > > > And this might be a problem as well.
> > > > 
> > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > thing with trace_printk. It was exactly what you said - which is the
> > > > excessively long preempt disabled times.
> > > 
> > > One approach would be to apply this patch against (say) v4.18, which
> > > does not have consolidated grace periods.  You might then be able to
> > > tell if the pr_crit() calls make any difference.
> > 
> > I could do that, yeah. But since the original problem went away due to
> > disabling preempts for a short while, I will move on and continue to focus on
> > updating other parts of the documenation. Just to mention I
> > brought this up because I thought its better to do that than not to, just
> > incase there is any lurking issue with the consolidation. Sorry if that ended
> > up with me being noisy.
> 
> Not a problem, no need to apologize!

Besides, digging through the code did point out a reasonable optimization.
In the common case, this would buy 100s of microseconds rather than
milliseconds, but it seems simple enough to be worthwhile.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 07921e8720907f58f82b142f2027fc56d5abdbfd
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Tue Oct 16 04:12:58 2018 -0700

    rcu: Speed up expedited GPs when interrupting RCU reader
    
    In PREEMPT kernels, an expedited grace period might send an IPI to a
    CPU that is executing an RCU read-side critical section.  In that case,
    it would be nice if the rcu_read_unlock() directly interacted with the
    RCU core code to immediately report the quiescent state.  And this does
    happen in the case where the reader has been preempted.  But it would
    also be a nice performance optimization if immediate reporting also
    happened in the preemption-free case.
    
    This commit therefore adds an ->exp_hint field to the task_struct structure's
    ->rcu_read_unlock_special field.  The IPI handler sets this hint when
    it has interrupted an RCU read-side critical section, and this causes
    the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
    which, if preemption is enabled, reports the quiescent state immediately.
    If preemption is disabled, then the report is required to be deferred
    until preemption (or bottom halves or interrupts or whatever) is re-enabled.
    
    Because this is a hint, it does nothing for more complicated cases.  For
    example, if the IPI interrupts an RCU reader, but interrupts are disabled
    across the rcu_read_unlock(), but another rcu_read_lock() is executed
    before interrupts are re-enabled, the hint will already have been cleared.
    If you do crazy things like this, reporting will be deferred until some
    later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
    
    Reported-by: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 004ca21f7e80..64ce751b5fe9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -571,8 +571,10 @@ union rcu_special {
 	struct {
 		u8			blocked;
 		u8			need_qs;
+		u8			exp_hint; /* Hint for performance. */
+		u8			pad; /* No garbage from compiler! */
 	} b; /* Bits. */
-	u16 s; /* Set of bits. */
+	u32 s; /* Set of bits. */
 };
 
 enum perf_event_task_context {
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index e669ccf3751b..928fe5893a57 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
 	 */
 	if (t->rcu_read_lock_nesting > 0) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		if (rnp->expmask & rdp->grpmask)
+		if (rnp->expmask & rdp->grpmask) {
 			rdp->deferred_qs = true;
+			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
+		}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 8b48bb7c224c..d6286eb6e77e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	local_irq_save(flags);
 	irqs_were_disabled = irqs_disabled_flags(flags);
 	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
-	    t->rcu_read_unlock_special.b.blocked) {
+	    t->rcu_read_unlock_special.s) {
 		/* Need to defer quiescent state until everything is enabled. */
+		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
 		raise_softirq_irqoff(RCU_SOFTIRQ);
 		local_irq_restore(flags);
 		return;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-16 11:26                       ` Paul E. McKenney
@ 2018-10-16 20:41                         ` Joel Fernandes
  2018-10-17 16:11                           ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-16 20:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > 
> > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > preempt-disable sections which was a different test.
> > > > 
> > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > 
> > > Makes sense :-) thanks.
> > 
> > Don't worry, that won't happen again.  ;-)
> > 
> > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > +		preempt_enable();
> > > > > > > > +		break;
> > > > > > > > +	case 777:
> > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > +		synchronize_rcu();
> > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > 
> > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > heavy console printing infrastructure.
> > > > > > 
> > > > > > And this might be a problem as well.
> > > > > 
> > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > excessively long preempt disabled times.
> > > > 
> > > > One approach would be to apply this patch against (say) v4.18, which
> > > > does not have consolidated grace periods.  You might then be able to
> > > > tell if the pr_crit() calls make any difference.
> > > 
> > > I could do that, yeah. But since the original problem went away due to
> > > disabling preempts for a short while, I will move on and continue to focus on
> > > updating other parts of the documenation. Just to mention I
> > > brought this up because I thought its better to do that than not to, just
> > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > up with me being noisy.
> > 
> > Not a problem, no need to apologize!
> 
> Besides, digging through the code did point out a reasonable optimization.
> In the common case, this would buy 100s of microseconds rather than
> milliseconds, but it seems simple enough to be worthwhile.  Thoughts?

Cool, thanks. One comment below:

> ------------------------------------------------------------------------
> 
> commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Tue Oct 16 04:12:58 2018 -0700
> 
>     rcu: Speed up expedited GPs when interrupting RCU reader
>     
>     In PREEMPT kernels, an expedited grace period might send an IPI to a
>     CPU that is executing an RCU read-side critical section.  In that case,
>     it would be nice if the rcu_read_unlock() directly interacted with the
>     RCU core code to immediately report the quiescent state.  And this does
>     happen in the case where the reader has been preempted.  But it would
>     also be a nice performance optimization if immediate reporting also
>     happened in the preemption-free case.
>     
>     This commit therefore adds an ->exp_hint field to the task_struct structure's
>     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
>     it has interrupted an RCU read-side critical section, and this causes
>     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
>     which, if preemption is enabled, reports the quiescent state immediately.
>     If preemption is disabled, then the report is required to be deferred
>     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
>     
>     Because this is a hint, it does nothing for more complicated cases.  For
>     example, if the IPI interrupts an RCU reader, but interrupts are disabled
>     across the rcu_read_unlock(), but another rcu_read_lock() is executed
>     before interrupts are re-enabled, the hint will already have been cleared.
>     If you do crazy things like this, reporting will be deferred until some
>     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
>     
>     Reported-by: Joel Fernandes <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 004ca21f7e80..64ce751b5fe9 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -571,8 +571,10 @@ union rcu_special {
>  	struct {
>  		u8			blocked;
>  		u8			need_qs;
> +		u8			exp_hint; /* Hint for performance. */
> +		u8			pad; /* No garbage from compiler! */
>  	} b; /* Bits. */
> -	u16 s; /* Set of bits. */
> +	u32 s; /* Set of bits. */
>  };
>  
>  enum perf_event_task_context {
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index e669ccf3751b..928fe5893a57 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
>  	 */
>  	if (t->rcu_read_lock_nesting > 0) {
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> -		if (rnp->expmask & rdp->grpmask)
> +		if (rnp->expmask & rdp->grpmask) {
>  			rdp->deferred_qs = true;
> +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> +		}
>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	}
>  
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 8b48bb7c224c..d6286eb6e77e 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  	local_irq_save(flags);
>  	irqs_were_disabled = irqs_disabled_flags(flags);
>  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> -	    t->rcu_read_unlock_special.b.blocked) {
> +	    t->rcu_read_unlock_special.s) {
>  		/* Need to defer quiescent state until everything is enabled. */
> +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
>  		raise_softirq_irqoff(RCU_SOFTIRQ);

Still going through this patch, but it seems to me like the fact that
rcu_read_unlock_special is called means someone has requested for a grace
period. Then in that case, does it not make sense to raise the softirq
for processing anyway?

thanks,

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-16 20:41                         ` Joel Fernandes
@ 2018-10-17 16:11                           ` Paul E. McKenney
  2018-10-17 18:15                             ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-17 16:11 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > > 
> > > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > preempt-disable sections which was a different test.
> > > > > 
> > > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > 
> > > > Makes sense :-) thanks.
> > > 
> > > Don't worry, that won't happen again.  ;-)
> > > 
> > > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > > +		preempt_enable();
> > > > > > > > > +		break;
> > > > > > > > > +	case 777:
> > > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > > +		synchronize_rcu();
> > > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > > 
> > > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > > heavy console printing infrastructure.
> > > > > > > 
> > > > > > > And this might be a problem as well.
> > > > > > 
> > > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > excessively long preempt disabled times.
> > > > > 
> > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > does not have consolidated grace periods.  You might then be able to
> > > > > tell if the pr_crit() calls make any difference.
> > > > 
> > > > I could do that, yeah. But since the original problem went away due to
> > > > disabling preempts for a short while, I will move on and continue to focus on
> > > > updating other parts of the documenation. Just to mention I
> > > > brought this up because I thought its better to do that than not to, just
> > > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > > up with me being noisy.
> > > 
> > > Not a problem, no need to apologize!
> > 
> > Besides, digging through the code did point out a reasonable optimization.
> > In the common case, this would buy 100s of microseconds rather than
> > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> 
> Cool, thanks. One comment below:
> 
> > ------------------------------------------------------------------------
> > 
> > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Tue Oct 16 04:12:58 2018 -0700
> > 
> >     rcu: Speed up expedited GPs when interrupting RCU reader
> >     
> >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> >     CPU that is executing an RCU read-side critical section.  In that case,
> >     it would be nice if the rcu_read_unlock() directly interacted with the
> >     RCU core code to immediately report the quiescent state.  And this does
> >     happen in the case where the reader has been preempted.  But it would
> >     also be a nice performance optimization if immediate reporting also
> >     happened in the preemption-free case.
> >     
> >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> >     it has interrupted an RCU read-side critical section, and this causes
> >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> >     which, if preemption is enabled, reports the quiescent state immediately.
> >     If preemption is disabled, then the report is required to be deferred
> >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> >     
> >     Because this is a hint, it does nothing for more complicated cases.  For
> >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> >     before interrupts are re-enabled, the hint will already have been cleared.
> >     If you do crazy things like this, reporting will be deferred until some
> >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> >     
> >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 004ca21f7e80..64ce751b5fe9 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -571,8 +571,10 @@ union rcu_special {
> >  	struct {
> >  		u8			blocked;
> >  		u8			need_qs;
> > +		u8			exp_hint; /* Hint for performance. */
> > +		u8			pad; /* No garbage from compiler! */
> >  	} b; /* Bits. */
> > -	u16 s; /* Set of bits. */
> > +	u32 s; /* Set of bits. */
> >  };
> >  
> >  enum perf_event_task_context {
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index e669ccf3751b..928fe5893a57 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> >  	 */
> >  	if (t->rcu_read_lock_nesting > 0) {
> >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > -		if (rnp->expmask & rdp->grpmask)
> > +		if (rnp->expmask & rdp->grpmask) {
> >  			rdp->deferred_qs = true;
> > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > +		}
> >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  	}
> >  
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 8b48bb7c224c..d6286eb6e77e 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  	local_irq_save(flags);
> >  	irqs_were_disabled = irqs_disabled_flags(flags);
> >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > -	    t->rcu_read_unlock_special.b.blocked) {
> > +	    t->rcu_read_unlock_special.s) {
> >  		/* Need to defer quiescent state until everything is enabled. */
> > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> 
> Still going through this patch, but it seems to me like the fact that
> rcu_read_unlock_special is called means someone has requested for a grace
> period. Then in that case, does it not make sense to raise the softirq
> for processing anyway?

Not necessarily.  Another reason that rcu_read_unlock_special() might
be called is if the RCU read-side critical section had been preempted,
in which case there might not even be a grace period in progress.
In addition, if interrupts, bottom halves, and preemption are all enabled,
the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
raising softirq, as it can instead just immediately report the quiescent
state.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-17 16:11                           ` Paul E. McKenney
@ 2018-10-17 18:15                             ` Joel Fernandes
  2018-10-17 20:33                               ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-17 18:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Wed, Oct 17, 2018 at 09:11:00AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> > On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > > [...]
> > > > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > > > 
> > > > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > > preempt-disable sections which was a different test.
> > > > > > 
> > > > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > > 
> > > > > Makes sense :-) thanks.
> > > > 
> > > > Don't worry, that won't happen again.  ;-)
> > > > 
> > > > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > > > +		preempt_enable();
> > > > > > > > > > +		break;
> > > > > > > > > > +	case 777:
> > > > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > > > +		synchronize_rcu();
> > > > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > > > 
> > > > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > > > heavy console printing infrastructure.
> > > > > > > > 
> > > > > > > > And this might be a problem as well.
> > > > > > > 
> > > > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > > excessively long preempt disabled times.
> > > > > > 
> > > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > > does not have consolidated grace periods.  You might then be able to
> > > > > > tell if the pr_crit() calls make any difference.
> > > > > 
> > > > > I could do that, yeah. But since the original problem went away due to
> > > > > disabling preempts for a short while, I will move on and continue to focus on
> > > > > updating other parts of the documenation. Just to mention I
> > > > > brought this up because I thought its better to do that than not to, just
> > > > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > > > up with me being noisy.
> > > > 
> > > > Not a problem, no need to apologize!
> > > 
> > > Besides, digging through the code did point out a reasonable optimization.
> > > In the common case, this would buy 100s of microseconds rather than
> > > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> > 
> > Cool, thanks. One comment below:
> > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > 
> > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > >     
> > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > >     CPU that is executing an RCU read-side critical section.  In that case,
> > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > >     RCU core code to immediately report the quiescent state.  And this does
> > >     happen in the case where the reader has been preempted.  But it would
> > >     also be a nice performance optimization if immediate reporting also
> > >     happened in the preemption-free case.
> > >     
> > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > >     it has interrupted an RCU read-side critical section, and this causes
> > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > >     which, if preemption is enabled, reports the quiescent state immediately.
> > >     If preemption is disabled, then the report is required to be deferred
> > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > >     
> > >     Because this is a hint, it does nothing for more complicated cases.  For
> > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > >     before interrupts are re-enabled, the hint will already have been cleared.
> > >     If you do crazy things like this, reporting will be deferred until some
> > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > >     
> > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > 
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index 004ca21f7e80..64ce751b5fe9 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -571,8 +571,10 @@ union rcu_special {
> > >  	struct {
> > >  		u8			blocked;
> > >  		u8			need_qs;
> > > +		u8			exp_hint; /* Hint for performance. */
> > > +		u8			pad; /* No garbage from compiler! */
> > >  	} b; /* Bits. */
> > > -	u16 s; /* Set of bits. */
> > > +	u32 s; /* Set of bits. */
> > >  };
> > >  
> > >  enum perf_event_task_context {
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index e669ccf3751b..928fe5893a57 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > >  	 */
> > >  	if (t->rcu_read_lock_nesting > 0) {
> > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > -		if (rnp->expmask & rdp->grpmask)
> > > +		if (rnp->expmask & rdp->grpmask) {
> > >  			rdp->deferred_qs = true;
> > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > +		}
> > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  	}
> > >  
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > >  	local_irq_save(flags);
> > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > +	    t->rcu_read_unlock_special.s) {
> > >  		/* Need to defer quiescent state until everything is enabled. */
> > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > 
> > Still going through this patch, but it seems to me like the fact that
> > rcu_read_unlock_special is called means someone has requested for a grace
> > period. Then in that case, does it not make sense to raise the softirq
> > for processing anyway?
> 
> Not necessarily.  Another reason that rcu_read_unlock_special() might
> be called is if the RCU read-side critical section had been preempted,
> in which case there might not even be a grace period in progress.

Yes true, it was at the back of my head ;) It needs to remove itself from the
blocked lists on the unlock. And ofcourse the preemption case is alsoo
clearly mentioned in this function's comments. (slaps self).

> In addition, if interrupts, bottom halves, and preemption are all enabled,
> the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> raising softirq, as it can instead just immediately report the quiescent
> state.

Makes sense. I will go through these code paths more today. Thank you for the
explanations!

I think something like need_exp_qs instead of 'exp_hint' may be more
descriptive?

Otherwise,
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-17 18:15                             ` Joel Fernandes
@ 2018-10-17 20:33                               ` Paul E. McKenney
  2018-10-18  2:07                                 ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-17 20:33 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Wed, Oct 17, 2018 at 11:15:05AM -0700, Joel Fernandes wrote:
> On Wed, Oct 17, 2018 at 09:11:00AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> > > On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > > > [...]
> > > > > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > > > > 
> > > > > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > > > preempt-disable sections which was a different test.
> > > > > > > 
> > > > > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > > > 
> > > > > > Makes sense :-) thanks.
> > > > > 
> > > > > Don't worry, that won't happen again.  ;-)
> > > > > 
> > > > > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > > > > +		preempt_enable();
> > > > > > > > > > > +		break;
> > > > > > > > > > > +	case 777:
> > > > > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > > > > +		synchronize_rcu();
> > > > > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > > > > 
> > > > > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > > > > heavy console printing infrastructure.
> > > > > > > > > 
> > > > > > > > > And this might be a problem as well.
> > > > > > > > 
> > > > > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > > > excessively long preempt disabled times.
> > > > > > > 
> > > > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > > > does not have consolidated grace periods.  You might then be able to
> > > > > > > tell if the pr_crit() calls make any difference.
> > > > > > 
> > > > > > I could do that, yeah. But since the original problem went away due to
> > > > > > disabling preempts for a short while, I will move on and continue to focus on
> > > > > > updating other parts of the documenation. Just to mention I
> > > > > > brought this up because I thought its better to do that than not to, just
> > > > > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > > > > up with me being noisy.
> > > > > 
> > > > > Not a problem, no need to apologize!
> > > > 
> > > > Besides, digging through the code did point out a reasonable optimization.
> > > > In the common case, this would buy 100s of microseconds rather than
> > > > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> > > 
> > > Cool, thanks. One comment below:
> > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > 
> > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > >     
> > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > >     RCU core code to immediately report the quiescent state.  And this does
> > > >     happen in the case where the reader has been preempted.  But it would
> > > >     also be a nice performance optimization if immediate reporting also
> > > >     happened in the preemption-free case.
> > > >     
> > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > >     it has interrupted an RCU read-side critical section, and this causes
> > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > >     If preemption is disabled, then the report is required to be deferred
> > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > >     
> > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > >     If you do crazy things like this, reporting will be deferred until some
> > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > >     
> > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > 
> > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > >  	struct {
> > > >  		u8			blocked;
> > > >  		u8			need_qs;
> > > > +		u8			exp_hint; /* Hint for performance. */
> > > > +		u8			pad; /* No garbage from compiler! */
> > > >  	} b; /* Bits. */
> > > > -	u16 s; /* Set of bits. */
> > > > +	u32 s; /* Set of bits. */
> > > >  };
> > > >  
> > > >  enum perf_event_task_context {
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index e669ccf3751b..928fe5893a57 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > >  	 */
> > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > -		if (rnp->expmask & rdp->grpmask)
> > > > +		if (rnp->expmask & rdp->grpmask) {
> > > >  			rdp->deferred_qs = true;
> > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > +		}
> > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  	}
> > > >  
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > >  	local_irq_save(flags);
> > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > +	    t->rcu_read_unlock_special.s) {
> > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > 
> > > Still going through this patch, but it seems to me like the fact that
> > > rcu_read_unlock_special is called means someone has requested for a grace
> > > period. Then in that case, does it not make sense to raise the softirq
> > > for processing anyway?
> > 
> > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > be called is if the RCU read-side critical section had been preempted,
> > in which case there might not even be a grace period in progress.
> 
> Yes true, it was at the back of my head ;) It needs to remove itself from the
> blocked lists on the unlock. And ofcourse the preemption case is alsoo
> clearly mentioned in this function's comments. (slaps self).

Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)

> > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > raising softirq, as it can instead just immediately report the quiescent
> > state.
> 
> Makes sense. I will go through these code paths more today. Thank you for the
> explanations!
> 
> I think something like need_exp_qs instead of 'exp_hint' may be more
> descriptive?

Well, it is only a hint due to the fact that it is not preserved across
complex sequences of overlapping RCU read-side critical sections of
different types.  So if you have the following sequence:

	rcu_read_lock();
	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
	preempt_disable();
	rcu_read_unlock(); /* Clears ->exp_hint. */
	preempt_enable(); /* But ->exp_hint is already cleared. */

This is OK because there will be some later event that passes the quiescent
state to the RCU core.  This will slow down the expedited grace period,
but this case should be uncommon.  If it does turn out to be common, then
some more complex scheme can be put in place.

Hmmm...  This patch does need some help, doesn't it?  How about the following
to be folded into the original?

							Thanx, Paul

> Otherwise,
> Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
> 
> thanks,
> 
>  - Joel

------------------------------------------------------------------------

commit d8d996385055d4708121fa253e04b4272119f5e2
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Wed Oct 17 13:32:25 2018 -0700

    fixup! rcu: Speed up expedited GPs when interrupting RCU reader
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d6286eb6e77e..117aeb582fdc 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		local_irq_restore(flags);
 		return;
 	}
+	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
 	rcu_preempt_deferred_qs_irqrestore(t, flags);
 }
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-17 20:33                               ` Paul E. McKenney
@ 2018-10-18  2:07                                 ` Joel Fernandes
  2018-10-18 14:46                                   ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-18  2:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Wed, Oct 17, 2018 at 01:33:24PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 17, 2018 at 11:15:05AM -0700, Joel Fernandes wrote:
> > On Wed, Oct 17, 2018 at 09:11:00AM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> > > > On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > > > > [...]
> > > > > > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > > > > > 
> > > > > > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > > > > preempt-disable sections which was a different test.
> > > > > > > > 
> > > > > > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > > > > 
> > > > > > > Makes sense :-) thanks.
> > > > > > 
> > > > > > Don't worry, that won't happen again.  ;-)
> > > > > > 
> > > > > > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > > > > > +		preempt_enable();
> > > > > > > > > > > > +		break;
> > > > > > > > > > > > +	case 777:
> > > > > > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > > > > > +		synchronize_rcu();
> > > > > > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > > > > > 
> > > > > > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > > > > > heavy console printing infrastructure.
> > > > > > > > > > 
> > > > > > > > > > And this might be a problem as well.
> > > > > > > > > 
> > > > > > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > > > > excessively long preempt disabled times.
> > > > > > > > 
> > > > > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > > > > does not have consolidated grace periods.  You might then be able to
> > > > > > > > tell if the pr_crit() calls make any difference.
> > > > > > > 
> > > > > > > I could do that, yeah. But since the original problem went away due to
> > > > > > > disabling preempts for a short while, I will move on and continue to focus on
> > > > > > > updating other parts of the documenation. Just to mention I
> > > > > > > brought this up because I thought its better to do that than not to, just
> > > > > > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > > > > > up with me being noisy.
> > > > > > 
> > > > > > Not a problem, no need to apologize!
> > > > > 
> > > > > Besides, digging through the code did point out a reasonable optimization.
> > > > > In the common case, this would buy 100s of microseconds rather than
> > > > > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> > > > 
> > > > Cool, thanks. One comment below:
> > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > > 
> > > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > > >     
> > > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > > >     RCU core code to immediately report the quiescent state.  And this does
> > > > >     happen in the case where the reader has been preempted.  But it would
> > > > >     also be a nice performance optimization if immediate reporting also
> > > > >     happened in the preemption-free case.
> > > > >     
> > > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > > >     it has interrupted an RCU read-side critical section, and this causes
> > > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > > >     If preemption is disabled, then the report is required to be deferred
> > > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > > >     
> > > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > > >     If you do crazy things like this, reporting will be deferred until some
> > > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > > >     
> > > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > 
> > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > > --- a/include/linux/sched.h
> > > > > +++ b/include/linux/sched.h
> > > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > > >  	struct {
> > > > >  		u8			blocked;
> > > > >  		u8			need_qs;
> > > > > +		u8			exp_hint; /* Hint for performance. */
> > > > > +		u8			pad; /* No garbage from compiler! */
> > > > >  	} b; /* Bits. */
> > > > > -	u16 s; /* Set of bits. */
> > > > > +	u32 s; /* Set of bits. */
> > > > >  };
> > > > >  
> > > > >  enum perf_event_task_context {
> > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > index e669ccf3751b..928fe5893a57 100644
> > > > > --- a/kernel/rcu/tree_exp.h
> > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > > >  	 */
> > > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > -		if (rnp->expmask & rdp->grpmask)
> > > > > +		if (rnp->expmask & rdp->grpmask) {
> > > > >  			rdp->deferred_qs = true;
> > > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > > +		}
> > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > >  	}
> > > > >  
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > >  	local_irq_save(flags);
> > > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > > +	    t->rcu_read_unlock_special.s) {
> > > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > 
> > > > Still going through this patch, but it seems to me like the fact that
> > > > rcu_read_unlock_special is called means someone has requested for a grace
> > > > period. Then in that case, does it not make sense to raise the softirq
> > > > for processing anyway?
> > > 
> > > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > > be called is if the RCU read-side critical section had been preempted,
> > > in which case there might not even be a grace period in progress.
> > 
> > Yes true, it was at the back of my head ;) It needs to remove itself from the
> > blocked lists on the unlock. And ofcourse the preemption case is alsoo
> > clearly mentioned in this function's comments. (slaps self).
> 
> Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)
> 
> > > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > > raising softirq, as it can instead just immediately report the quiescent
> > > state.
> > 
> > Makes sense. I will go through these code paths more today. Thank you for the
> > explanations!
> > 
> > I think something like need_exp_qs instead of 'exp_hint' may be more
> > descriptive?
> 
> Well, it is only a hint due to the fact that it is not preserved across
> complex sequences of overlapping RCU read-side critical sections of
> different types.  So if you have the following sequence:
> 
> 	rcu_read_lock();
> 	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
> 	preempt_disable();
> 	rcu_read_unlock(); /* Clears ->exp_hint. */
> 	preempt_enable(); /* But ->exp_hint is already cleared. */
> 
> This is OK because there will be some later event that passes the quiescent
> state to the RCU core.  This will slow down the expedited grace period,
> but this case should be uncommon.  If it does turn out to be common, then
> some more complex scheme can be put in place.
> 
> Hmmm...  This patch does need some help, doesn't it?  How about the following
> to be folded into the original?
> 
> commit d8d996385055d4708121fa253e04b4272119f5e2
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Wed Oct 17 13:32:25 2018 -0700
> 
>     fixup! rcu: Speed up expedited GPs when interrupting RCU reader
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index d6286eb6e77e..117aeb582fdc 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		local_irq_restore(flags);
>  		return;
>  	}
> +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  }
>  

Sure, I believe so. I was also thinking out load about if we can avoid
raising of the softirq for some cases in rcu_read_unlock_special:

For example, in rcu_read_unlock_special()

static void rcu_read_unlock_special(struct task_struct *t)
{
[...]
	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
	    t->rcu_read_unlock_special.s) {
		/* Need to defer quiescent state until everything is enabled. */
		raise_softirq_irqoff(RCU_SOFTIRQ);
		local_irq_restore(flags);
		return;
	}
	rcu_preempt_deferred_qs_irqrestore(t, flags);
}

Instead of raising the softirq, for the case where irqs are enabled, but
preemption is disabled, can we not just do:

		set_tsk_need_resched(current);
		set_preempt_need_resched();

and return? Not sure the benefits of doing that are, but it seems nice to
avoid raising the softirq if possible, for benefit of real-time workloads.

Also it seems like there is a chance the softirq might run before the
preemption is reenabled anyway right?

Also one last thing, in your patch - do we really need to test for
"t->rcu_read_unlock_special.s" in rcu_read_unlock_special()? AFAICT,
rcu_read_unlock_special would only be called if t->rcu_read_unlock_special.s
is set in the first place so we can drop the test for that.

thanks,

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-18  2:07                                 ` Joel Fernandes
@ 2018-10-18 14:46                                   ` Paul E. McKenney
  2018-10-19  0:03                                     ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-18 14:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Wed, Oct 17, 2018 at 07:07:51PM -0700, Joel Fernandes wrote:
> On Wed, Oct 17, 2018 at 01:33:24PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 17, 2018 at 11:15:05AM -0700, Joel Fernandes wrote:
> > > On Wed, Oct 17, 2018 at 09:11:00AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> > > > > On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > > > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > > > > > [...]
> > > > > > > > > > > In any case, please don't spin for milliseconds with preemption disabled.
> > > > > > > > > > > The real-time guys are unlikely to be happy with you if you do this!
> > > > > > > > > > 
> > > > > > > > > > Well just to clarify, I was just running Oleg's test which did this. This
> > > > > > > > > > test was mentioned in the original documentation that I deleted. Ofcourse I
> > > > > > > > > > would not dare do such a thing in production code :-D. I guess to Oleg's
> > > > > > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > > > > > preempt-disable sections which was a different test.
> > > > > > > > > 
> > > > > > > > > Understood!  Just pointing out that RCU's tolerating a given action does
> > > > > > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > > > > > 
> > > > > > > > Makes sense :-) thanks.
> > > > > > > 
> > > > > > > Don't worry, that won't happen again.  ;-)
> > > > > > > 
> > > > > > > > > > > > > +		pr_crit("SPIN done!\n");
> > > > > > > > > > > > > +		preempt_enable();
> > > > > > > > > > > > > +		break;
> > > > > > > > > > > > > +	case 777:
> > > > > > > > > > > > > +		pr_crit("SYNC start\n");
> > > > > > > > > > > > > +		synchronize_rcu();
> > > > > > > > > > > > > +		pr_crit("SYNC done!\n");
> > > > > > > > > > > > 
> > > > > > > > > > > > But you are using the console printing infrastructure which is rather
> > > > > > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that you
> > > > > > > > > > > > write to the lock-free ring buffer, this will reduce the noise from the
> > > > > > > > > > > > heavy console printing infrastructure.
> > > > > > > > > > > 
> > > > > > > > > > > And this might be a problem as well.
> > > > > > > > > > 
> > > > > > > > > > This was not the issue (or atleast not fully the issue) since I saw the same
> > > > > > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > > > > > excessively long preempt disabled times.
> > > > > > > > > 
> > > > > > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > > > > > does not have consolidated grace periods.  You might then be able to
> > > > > > > > > tell if the pr_crit() calls make any difference.
> > > > > > > > 
> > > > > > > > I could do that, yeah. But since the original problem went away due to
> > > > > > > > disabling preempts for a short while, I will move on and continue to focus on
> > > > > > > > updating other parts of the documenation. Just to mention I
> > > > > > > > brought this up because I thought its better to do that than not to, just
> > > > > > > > incase there is any lurking issue with the consolidation. Sorry if that ended
> > > > > > > > up with me being noisy.
> > > > > > > 
> > > > > > > Not a problem, no need to apologize!
> > > > > > 
> > > > > > Besides, digging through the code did point out a reasonable optimization.
> > > > > > In the common case, this would buy 100s of microseconds rather than
> > > > > > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> > > > > 
> > > > > Cool, thanks. One comment below:
> > > > > 
> > > > > > ------------------------------------------------------------------------
> > > > > > 
> > > > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > > > 
> > > > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > > > >     
> > > > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > > > >     RCU core code to immediately report the quiescent state.  And this does
> > > > > >     happen in the case where the reader has been preempted.  But it would
> > > > > >     also be a nice performance optimization if immediate reporting also
> > > > > >     happened in the preemption-free case.
> > > > > >     
> > > > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > > > >     it has interrupted an RCU read-side critical section, and this causes
> > > > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > > > >     If preemption is disabled, then the report is required to be deferred
> > > > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > > > >     
> > > > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > > > >     If you do crazy things like this, reporting will be deferred until some
> > > > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > > > >     
> > > > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > 
> > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > > > --- a/include/linux/sched.h
> > > > > > +++ b/include/linux/sched.h
> > > > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > > > >  	struct {
> > > > > >  		u8			blocked;
> > > > > >  		u8			need_qs;
> > > > > > +		u8			exp_hint; /* Hint for performance. */
> > > > > > +		u8			pad; /* No garbage from compiler! */
> > > > > >  	} b; /* Bits. */
> > > > > > -	u16 s; /* Set of bits. */
> > > > > > +	u32 s; /* Set of bits. */
> > > > > >  };
> > > > > >  
> > > > > >  enum perf_event_task_context {
> > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > > index e669ccf3751b..928fe5893a57 100644
> > > > > > --- a/kernel/rcu/tree_exp.h
> > > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > > > >  	 */
> > > > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > -		if (rnp->expmask & rdp->grpmask)
> > > > > > +		if (rnp->expmask & rdp->grpmask) {
> > > > > >  			rdp->deferred_qs = true;
> > > > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > > > +		}
> > > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > >  	}
> > > > > >  
> > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > > >  	local_irq_save(flags);
> > > > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > > > +	    t->rcu_read_unlock_special.s) {
> > > > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > 
> > > > > Still going through this patch, but it seems to me like the fact that
> > > > > rcu_read_unlock_special is called means someone has requested for a grace
> > > > > period. Then in that case, does it not make sense to raise the softirq
> > > > > for processing anyway?
> > > > 
> > > > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > > > be called is if the RCU read-side critical section had been preempted,
> > > > in which case there might not even be a grace period in progress.
> > > 
> > > Yes true, it was at the back of my head ;) It needs to remove itself from the
> > > blocked lists on the unlock. And ofcourse the preemption case is alsoo
> > > clearly mentioned in this function's comments. (slaps self).
> > 
> > Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)
> > 
> > > > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > > > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > > > raising softirq, as it can instead just immediately report the quiescent
> > > > state.
> > > 
> > > Makes sense. I will go through these code paths more today. Thank you for the
> > > explanations!
> > > 
> > > I think something like need_exp_qs instead of 'exp_hint' may be more
> > > descriptive?
> > 
> > Well, it is only a hint due to the fact that it is not preserved across
> > complex sequences of overlapping RCU read-side critical sections of
> > different types.  So if you have the following sequence:
> > 
> > 	rcu_read_lock();
> > 	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
> > 	preempt_disable();
> > 	rcu_read_unlock(); /* Clears ->exp_hint. */
> > 	preempt_enable(); /* But ->exp_hint is already cleared. */
> > 
> > This is OK because there will be some later event that passes the quiescent
> > state to the RCU core.  This will slow down the expedited grace period,
> > but this case should be uncommon.  If it does turn out to be common, then
> > some more complex scheme can be put in place.
> > 
> > Hmmm...  This patch does need some help, doesn't it?  How about the following
> > to be folded into the original?
> > 
> > commit d8d996385055d4708121fa253e04b4272119f5e2
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Wed Oct 17 13:32:25 2018 -0700
> > 
> >     fixup! rcu: Speed up expedited GPs when interrupting RCU reader
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index d6286eb6e77e..117aeb582fdc 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  		local_irq_restore(flags);
> >  		return;
> >  	}
> > +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> >  	rcu_preempt_deferred_qs_irqrestore(t, flags);
> >  }
> >  
> 
> Sure, I believe so. I was also thinking out load about if we can avoid
> raising of the softirq for some cases in rcu_read_unlock_special:
> 
> For example, in rcu_read_unlock_special()
> 
> static void rcu_read_unlock_special(struct task_struct *t)
> {
> [...]
> 	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> 	    t->rcu_read_unlock_special.s) {
> 		/* Need to defer quiescent state until everything is enabled. */
> 		raise_softirq_irqoff(RCU_SOFTIRQ);
> 		local_irq_restore(flags);
> 		return;
> 	}
> 	rcu_preempt_deferred_qs_irqrestore(t, flags);
> }
> 
> Instead of raising the softirq, for the case where irqs are enabled, but
> preemption is disabled, can we not just do:
> 
> 		set_tsk_need_resched(current);
> 		set_preempt_need_resched();
> 
> and return? Not sure the benefits of doing that are, but it seems nice to
> avoid raising the softirq if possible, for benefit of real-time workloads.

This approach would work very well in the case when preemption or bottom
halves were disabled, but would not handle the case where interrupts were
enabled during the RCU read-side critical section, an expedited grace
period started (thus setting ->exp_hint), interrupts where then disabled,
and finally rcu_read_unlock() was invoked.  Re-enabling interrupts would
not cause either softirq or the scheduler to do anything, so the end of
the expedited grace period might be delayed for some time, for example,
until the next scheduling-clock interrupt.

But please see below.

> Also it seems like there is a chance the softirq might run before the
> preemption is reenabled anyway right?

Not unless the rcu_read_unlock() is invoked from within a softirq
handler on the one hand or within an interrupt handler that interrupted
a preempt-disable region of code.  Otherwise, because interrupts are
disabled, the raise_softirq() will wake up ksoftirqd, which cannot run
until both preemption and bottom halves are enabled.

> Also one last thing, in your patch - do we really need to test for
> "t->rcu_read_unlock_special.s" in rcu_read_unlock_special()? AFAICT,
> rcu_read_unlock_special would only be called if t->rcu_read_unlock_special.s
> is set in the first place so we can drop the test for that.

Good point!

How about the following?

							Thanx, Paul

------------------------------------------------------------------------

static void rcu_read_unlock_special(struct task_struct *t)
{
	unsigned long flags;
	bool preempt_bh_were_disabled =
			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
	bool irqs_were_disabled;

	/* NMI handlers cannot block and cannot safely manipulate state. */
	if (in_nmi())
		return;

	local_irq_save(flags);
	irqs_were_disabled = irqs_disabled_flags(flags);
	if (preempt_bh_were_disabled || irqs_were_disabled) {
		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
		/* Need to defer quiescent state until everything is enabled. */
		if (irqs_were_disabled) {
			raise_softirq_irqoff(RCU_SOFTIRQ);
		} else {
			set_tsk_need_resched(current);
			set_preempt_need_resched();
		}
		local_irq_restore(flags);
		return;
	}
	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
	rcu_preempt_deferred_qs_irqrestore(t, flags);
}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-18 14:46                                   ` Paul E. McKenney
@ 2018-10-19  0:03                                     ` Joel Fernandes
  2018-10-19  0:19                                       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19  0:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Thu, Oct 18, 2018 at 07:46:37AM -0700, Paul E. McKenney wrote:
[..]
> > > > > > > ------------------------------------------------------------------------
> > > > > > > 
> > > > > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > > > > 
> > > > > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > > > > >     
> > > > > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > > > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > > > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > > > > >     RCU core code to immediately report the quiescent state.  And this does
> > > > > > >     happen in the case where the reader has been preempted.  But it would
> > > > > > >     also be a nice performance optimization if immediate reporting also
> > > > > > >     happened in the preemption-free case.
> > > > > > >     
> > > > > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > > > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > > > > >     it has interrupted an RCU read-side critical section, and this causes
> > > > > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > > > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > > > > >     If preemption is disabled, then the report is required to be deferred
> > > > > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > > > > >     
> > > > > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > > > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > > > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > > > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > > > > >     If you do crazy things like this, reporting will be deferred until some
> > > > > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > > > > >     
> > > > > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > 
> > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > > > > --- a/include/linux/sched.h
> > > > > > > +++ b/include/linux/sched.h
> > > > > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > > > > >  	struct {
> > > > > > >  		u8			blocked;
> > > > > > >  		u8			need_qs;
> > > > > > > +		u8			exp_hint; /* Hint for performance. */
> > > > > > > +		u8			pad; /* No garbage from compiler! */
> > > > > > >  	} b; /* Bits. */
> > > > > > > -	u16 s; /* Set of bits. */
> > > > > > > +	u32 s; /* Set of bits. */
> > > > > > >  };
> > > > > > >  
> > > > > > >  enum perf_event_task_context {
> > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > > > index e669ccf3751b..928fe5893a57 100644
> > > > > > > --- a/kernel/rcu/tree_exp.h
> > > > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > > > > >  	 */
> > > > > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > > -		if (rnp->expmask & rdp->grpmask)
> > > > > > > +		if (rnp->expmask & rdp->grpmask) {
> > > > > > >  			rdp->deferred_qs = true;
> > > > > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > > > > +		}
> > > > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > > >  	}
> > > > > > >  
> > > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > > > >  	local_irq_save(flags);
> > > > > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > > > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > > > > +	    t->rcu_read_unlock_special.s) {
> > > > > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > > 
> > > > > > Still going through this patch, but it seems to me like the fact that
> > > > > > rcu_read_unlock_special is called means someone has requested for a grace
> > > > > > period. Then in that case, does it not make sense to raise the softirq
> > > > > > for processing anyway?
> > > > > 
> > > > > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > > > > be called is if the RCU read-side critical section had been preempted,
> > > > > in which case there might not even be a grace period in progress.
> > > > 
> > > > Yes true, it was at the back of my head ;) It needs to remove itself from the
> > > > blocked lists on the unlock. And ofcourse the preemption case is alsoo
> > > > clearly mentioned in this function's comments. (slaps self).
> > > 
> > > Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)
> > > 
> > > > > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > > > > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > > > > raising softirq, as it can instead just immediately report the quiescent
> > > > > state.
> > > > 
> > > > Makes sense. I will go through these code paths more today. Thank you for the
> > > > explanations!
> > > > 
> > > > I think something like need_exp_qs instead of 'exp_hint' may be more
> > > > descriptive?
> > > 
> > > Well, it is only a hint due to the fact that it is not preserved across
> > > complex sequences of overlapping RCU read-side critical sections of
> > > different types.  So if you have the following sequence:
> > > 
> > > 	rcu_read_lock();
> > > 	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
> > > 	preempt_disable();
> > > 	rcu_read_unlock(); /* Clears ->exp_hint. */
> > > 	preempt_enable(); /* But ->exp_hint is already cleared. */
> > > 
> > > This is OK because there will be some later event that passes the quiescent
> > > state to the RCU core.  This will slow down the expedited grace period,
> > > but this case should be uncommon.  If it does turn out to be common, then
> > > some more complex scheme can be put in place.
> > > 
> > > Hmmm...  This patch does need some help, doesn't it?  How about the following
> > > to be folded into the original?
> > > 
> > > commit d8d996385055d4708121fa253e04b4272119f5e2
> > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > Date:   Wed Oct 17 13:32:25 2018 -0700
> > > 
> > >     fixup! rcu: Speed up expedited GPs when interrupting RCU reader
> > >     
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index d6286eb6e77e..117aeb582fdc 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > >  		local_irq_restore(flags);
> > >  		return;
> > >  	}
> > > +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > >  	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > >  }
> > >  
> > 
> > Sure, I believe so. I was also thinking out load about if we can avoid
> > raising of the softirq for some cases in rcu_read_unlock_special:
> > 
> > For example, in rcu_read_unlock_special()
> > 
> > static void rcu_read_unlock_special(struct task_struct *t)
> > {
> > [...]
> > 	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > 	    t->rcu_read_unlock_special.s) {
> > 		/* Need to defer quiescent state until everything is enabled. */
> > 		raise_softirq_irqoff(RCU_SOFTIRQ);
> > 		local_irq_restore(flags);
> > 		return;
> > 	}
> > 	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > }
> > 
> > Instead of raising the softirq, for the case where irqs are enabled, but
> > preemption is disabled, can we not just do:
> > 
> > 		set_tsk_need_resched(current);
> > 		set_preempt_need_resched();
> > 
> > and return? Not sure the benefits of doing that are, but it seems nice to
> > avoid raising the softirq if possible, for benefit of real-time workloads.
> 
> This approach would work very well in the case when preemption or bottom
> halves were disabled, but would not handle the case where interrupts were
> enabled during the RCU read-side critical section, an expedited grace
> period started (thus setting ->exp_hint), interrupts where then disabled,
> and finally rcu_read_unlock() was invoked.  Re-enabling interrupts would
> not cause either softirq or the scheduler to do anything, so the end of
> the expedited grace period might be delayed for some time, for example,
> until the next scheduling-clock interrupt.
> 
> But please see below.
> 
> > Also it seems like there is a chance the softirq might run before the
> > preemption is reenabled anyway right?
> 
> Not unless the rcu_read_unlock() is invoked from within a softirq
> handler on the one hand or within an interrupt handler that interrupted
> a preempt-disable region of code.  Otherwise, because interrupts are
> disabled, the raise_softirq() will wake up ksoftirqd, which cannot run
> until both preemption and bottom halves are enabled.
> 
> > Also one last thing, in your patch - do we really need to test for
> > "t->rcu_read_unlock_special.s" in rcu_read_unlock_special()? AFAICT,
> > rcu_read_unlock_special would only be called if t->rcu_read_unlock_special.s
> > is set in the first place so we can drop the test for that.
> 
> Good point!
> 
> How about the following?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> static void rcu_read_unlock_special(struct task_struct *t)
> {
> 	unsigned long flags;
> 	bool preempt_bh_were_disabled =
> 			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> 	bool irqs_were_disabled;
> 
> 	/* NMI handlers cannot block and cannot safely manipulate state. */
> 	if (in_nmi())
> 		return;
> 
> 	local_irq_save(flags);
> 	irqs_were_disabled = irqs_disabled_flags(flags);
> 	if (preempt_bh_were_disabled || irqs_were_disabled) {
> 		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> 		/* Need to defer quiescent state until everything is enabled. */
> 		if (irqs_were_disabled) {
> 			raise_softirq_irqoff(RCU_SOFTIRQ);
> 		} else {
> 			set_tsk_need_resched(current);
> 			set_preempt_need_resched();
> 		}

Looks good to me, thanks! Maybe some code comments would be nice as well.

Shouldn't we also set_tsk_need_resched for the irqs_were_disabled case, so
that say if we are in an IRQ disabled region (local_irq_disable), then
ksoftirqd would run as possible once IRQs are renabled?

By the way, the user calling preempt_enable_no_resched would be another case
where the expedited grace period might extend longer than needed with the
above patch, but that seems unlikely enough to worry about :-)

thanks,

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  0:03                                     ` Joel Fernandes
@ 2018-10-19  0:19                                       ` Paul E. McKenney
  2018-10-19  1:12                                         ` Steven Rostedt
  2018-10-19  1:26                                         ` Joel Fernandes
  0 siblings, 2 replies; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-19  0:19 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Thu, Oct 18, 2018 at 05:03:50PM -0700, Joel Fernandes wrote:
> On Thu, Oct 18, 2018 at 07:46:37AM -0700, Paul E. McKenney wrote:
> [..]
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > > 
> > > > > > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > > > > > 
> > > > > > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > > > > > >     
> > > > > > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > > > > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > > > > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > > > > > >     RCU core code to immediately report the quiescent state.  And this does
> > > > > > > >     happen in the case where the reader has been preempted.  But it would
> > > > > > > >     also be a nice performance optimization if immediate reporting also
> > > > > > > >     happened in the preemption-free case.
> > > > > > > >     
> > > > > > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > > > > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > > > > > >     it has interrupted an RCU read-side critical section, and this causes
> > > > > > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > > > > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > > > > > >     If preemption is disabled, then the report is required to be deferred
> > > > > > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > > > > > >     
> > > > > > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > > > > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > > > > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > > > > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > > > > > >     If you do crazy things like this, reporting will be deferred until some
> > > > > > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > > > > > >     
> > > > > > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > > > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > > 
> > > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > > > > > --- a/include/linux/sched.h
> > > > > > > > +++ b/include/linux/sched.h
> > > > > > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > > > > > >  	struct {
> > > > > > > >  		u8			blocked;
> > > > > > > >  		u8			need_qs;
> > > > > > > > +		u8			exp_hint; /* Hint for performance. */
> > > > > > > > +		u8			pad; /* No garbage from compiler! */
> > > > > > > >  	} b; /* Bits. */
> > > > > > > > -	u16 s; /* Set of bits. */
> > > > > > > > +	u32 s; /* Set of bits. */
> > > > > > > >  };
> > > > > > > >  
> > > > > > > >  enum perf_event_task_context {
> > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > > > > index e669ccf3751b..928fe5893a57 100644
> > > > > > > > --- a/kernel/rcu/tree_exp.h
> > > > > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > > > > > >  	 */
> > > > > > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > > > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > > > -		if (rnp->expmask & rdp->grpmask)
> > > > > > > > +		if (rnp->expmask & rdp->grpmask) {
> > > > > > > >  			rdp->deferred_qs = true;
> > > > > > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > > > > > +		}
> > > > > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > > > >  	}
> > > > > > > >  
> > > > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > > > > >  	local_irq_save(flags);
> > > > > > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > > > > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > > > > > +	    t->rcu_read_unlock_special.s) {
> > > > > > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > > > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > > > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > > > 
> > > > > > > Still going through this patch, but it seems to me like the fact that
> > > > > > > rcu_read_unlock_special is called means someone has requested for a grace
> > > > > > > period. Then in that case, does it not make sense to raise the softirq
> > > > > > > for processing anyway?
> > > > > > 
> > > > > > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > > > > > be called is if the RCU read-side critical section had been preempted,
> > > > > > in which case there might not even be a grace period in progress.
> > > > > 
> > > > > Yes true, it was at the back of my head ;) It needs to remove itself from the
> > > > > blocked lists on the unlock. And ofcourse the preemption case is alsoo
> > > > > clearly mentioned in this function's comments. (slaps self).
> > > > 
> > > > Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)
> > > > 
> > > > > > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > > > > > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > > > > > raising softirq, as it can instead just immediately report the quiescent
> > > > > > state.
> > > > > 
> > > > > Makes sense. I will go through these code paths more today. Thank you for the
> > > > > explanations!
> > > > > 
> > > > > I think something like need_exp_qs instead of 'exp_hint' may be more
> > > > > descriptive?
> > > > 
> > > > Well, it is only a hint due to the fact that it is not preserved across
> > > > complex sequences of overlapping RCU read-side critical sections of
> > > > different types.  So if you have the following sequence:
> > > > 
> > > > 	rcu_read_lock();
> > > > 	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
> > > > 	preempt_disable();
> > > > 	rcu_read_unlock(); /* Clears ->exp_hint. */
> > > > 	preempt_enable(); /* But ->exp_hint is already cleared. */
> > > > 
> > > > This is OK because there will be some later event that passes the quiescent
> > > > state to the RCU core.  This will slow down the expedited grace period,
> > > > but this case should be uncommon.  If it does turn out to be common, then
> > > > some more complex scheme can be put in place.
> > > > 
> > > > Hmmm...  This patch does need some help, doesn't it?  How about the following
> > > > to be folded into the original?
> > > > 
> > > > commit d8d996385055d4708121fa253e04b4272119f5e2
> > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > Date:   Wed Oct 17 13:32:25 2018 -0700
> > > > 
> > > >     fixup! rcu: Speed up expedited GPs when interrupting RCU reader
> > > >     
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index d6286eb6e77e..117aeb582fdc 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > >  		local_irq_restore(flags);
> > > >  		return;
> > > >  	}
> > > > +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > >  	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > >  }
> > > >  
> > > 
> > > Sure, I believe so. I was also thinking out load about if we can avoid
> > > raising of the softirq for some cases in rcu_read_unlock_special:
> > > 
> > > For example, in rcu_read_unlock_special()
> > > 
> > > static void rcu_read_unlock_special(struct task_struct *t)
> > > {
> > > [...]
> > > 	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > 	    t->rcu_read_unlock_special.s) {
> > > 		/* Need to defer quiescent state until everything is enabled. */
> > > 		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > 		local_irq_restore(flags);
> > > 		return;
> > > 	}
> > > 	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > }
> > > 
> > > Instead of raising the softirq, for the case where irqs are enabled, but
> > > preemption is disabled, can we not just do:
> > > 
> > > 		set_tsk_need_resched(current);
> > > 		set_preempt_need_resched();
> > > 
> > > and return? Not sure the benefits of doing that are, but it seems nice to
> > > avoid raising the softirq if possible, for benefit of real-time workloads.
> > 
> > This approach would work very well in the case when preemption or bottom
> > halves were disabled, but would not handle the case where interrupts were
> > enabled during the RCU read-side critical section, an expedited grace
> > period started (thus setting ->exp_hint), interrupts where then disabled,
> > and finally rcu_read_unlock() was invoked.  Re-enabling interrupts would
> > not cause either softirq or the scheduler to do anything, so the end of
> > the expedited grace period might be delayed for some time, for example,
> > until the next scheduling-clock interrupt.
> > 
> > But please see below.
> > 
> > > Also it seems like there is a chance the softirq might run before the
> > > preemption is reenabled anyway right?
> > 
> > Not unless the rcu_read_unlock() is invoked from within a softirq
> > handler on the one hand or within an interrupt handler that interrupted
> > a preempt-disable region of code.  Otherwise, because interrupts are
> > disabled, the raise_softirq() will wake up ksoftirqd, which cannot run
> > until both preemption and bottom halves are enabled.
> > 
> > > Also one last thing, in your patch - do we really need to test for
> > > "t->rcu_read_unlock_special.s" in rcu_read_unlock_special()? AFAICT,
> > > rcu_read_unlock_special would only be called if t->rcu_read_unlock_special.s
> > > is set in the first place so we can drop the test for that.
> > 
> > Good point!
> > 
> > How about the following?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > static void rcu_read_unlock_special(struct task_struct *t)
> > {
> > 	unsigned long flags;
> > 	bool preempt_bh_were_disabled =
> > 			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> > 	bool irqs_were_disabled;
> > 
> > 	/* NMI handlers cannot block and cannot safely manipulate state. */
> > 	if (in_nmi())
> > 		return;
> > 
> > 	local_irq_save(flags);
> > 	irqs_were_disabled = irqs_disabled_flags(flags);
> > 	if (preempt_bh_were_disabled || irqs_were_disabled) {
> > 		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > 		/* Need to defer quiescent state until everything is enabled. */
> > 		if (irqs_were_disabled) {
> > 			raise_softirq_irqoff(RCU_SOFTIRQ);
> > 		} else {
> > 			set_tsk_need_resched(current);
> > 			set_preempt_need_resched();
> > 		}
> 
> Looks good to me, thanks! Maybe some code comments would be nice as well.
> 
> Shouldn't we also set_tsk_need_resched for the irqs_were_disabled case, so
> that say if we are in an IRQ disabled region (local_irq_disable), then
> ksoftirqd would run as possible once IRQs are renabled?

Last I checked, local_irq_restore() didn't check for reschedules, instead
relying on IPIs and scheduling-clock interrupts to do its dirty work.
Has that changed?

> By the way, the user calling preempt_enable_no_resched would be another case
> where the expedited grace period might extend longer than needed with the
> above patch, but that seems unlikely enough to worry about :-)

I figured that whoever calls preempt_enable_no_resched() is taking the
responsibility for permitting preemption in the near future, and if they
fail to do so, they will get called on it.  Hard to hide from the latency
tracer, after all.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  0:19                                       ` Paul E. McKenney
@ 2018-10-19  1:12                                         ` Steven Rostedt
  2018-10-19  1:27                                           ` Joel Fernandes
  2018-10-19  1:26                                         ` Joel Fernandes
  1 sibling, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2018-10-19  1:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, 18 Oct 2018 17:19:32 -0700
"Paul E. McKenney" <paulmck@linux.ibm.com> wrote:

> I figured that whoever calls preempt_enable_no_resched() is taking the
> responsibility for permitting preemption in the near future, and if they
> fail to do so, they will get called on it.  Hard to hide from the latency
> tracer, after all.  ;-)

Correct, and doing a search of preempt_enable_no_resched() I see
there's one in the ftrace ring buffer code, that was added a long time
ago (2008) to fix a recursion bug that no longer exists, and this now
can leak a preemption point.

I'll have to go fix that :-(

-- Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  0:19                                       ` Paul E. McKenney
  2018-10-19  1:12                                         ` Steven Rostedt
@ 2018-10-19  1:26                                         ` Joel Fernandes
  2018-10-19  1:50                                           ` Steven Rostedt
  1 sibling, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19  1:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nikolay Borisov, linux-kernel, Jonathan Corbet, Josh Triplett,
	Lai Jiangshan, linux-doc, Mathieu Desnoyers, Steven Rostedt

On Thu, Oct 18, 2018 at 05:19:32PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 18, 2018 at 05:03:50PM -0700, Joel Fernandes wrote:
> > On Thu, Oct 18, 2018 at 07:46:37AM -0700, Paul E. McKenney wrote:
> > [..]
> > > > > > > > > ------------------------------------------------------------------------
> > > > > > > > > 
> > > > > > > > > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > > > > > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > > > Date:   Tue Oct 16 04:12:58 2018 -0700
> > > > > > > > > 
> > > > > > > > >     rcu: Speed up expedited GPs when interrupting RCU reader
> > > > > > > > >     
> > > > > > > > >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> > > > > > > > >     CPU that is executing an RCU read-side critical section.  In that case,
> > > > > > > > >     it would be nice if the rcu_read_unlock() directly interacted with the
> > > > > > > > >     RCU core code to immediately report the quiescent state.  And this does
> > > > > > > > >     happen in the case where the reader has been preempted.  But it would
> > > > > > > > >     also be a nice performance optimization if immediate reporting also
> > > > > > > > >     happened in the preemption-free case.
> > > > > > > > >     
> > > > > > > > >     This commit therefore adds an ->exp_hint field to the task_struct structure's
> > > > > > > > >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> > > > > > > > >     it has interrupted an RCU read-side critical section, and this causes
> > > > > > > > >     the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
> > > > > > > > >     which, if preemption is enabled, reports the quiescent state immediately.
> > > > > > > > >     If preemption is disabled, then the report is required to be deferred
> > > > > > > > >     until preemption (or bottom halves or interrupts or whatever) is re-enabled.
> > > > > > > > >     
> > > > > > > > >     Because this is a hint, it does nothing for more complicated cases.  For
> > > > > > > > >     example, if the IPI interrupts an RCU reader, but interrupts are disabled
> > > > > > > > >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> > > > > > > > >     before interrupts are re-enabled, the hint will already have been cleared.
> > > > > > > > >     If you do crazy things like this, reporting will be deferred until some
> > > > > > > > >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> > > > > > > > >     
> > > > > > > > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > > > > > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > > > > > 
> > > > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > > > > > index 004ca21f7e80..64ce751b5fe9 100644
> > > > > > > > > --- a/include/linux/sched.h
> > > > > > > > > +++ b/include/linux/sched.h
> > > > > > > > > @@ -571,8 +571,10 @@ union rcu_special {
> > > > > > > > >  	struct {
> > > > > > > > >  		u8			blocked;
> > > > > > > > >  		u8			need_qs;
> > > > > > > > > +		u8			exp_hint; /* Hint for performance. */
> > > > > > > > > +		u8			pad; /* No garbage from compiler! */
> > > > > > > > >  	} b; /* Bits. */
> > > > > > > > > -	u16 s; /* Set of bits. */
> > > > > > > > > +	u32 s; /* Set of bits. */
> > > > > > > > >  };
> > > > > > > > >  
> > > > > > > > >  enum perf_event_task_context {
> > > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > > > > > index e669ccf3751b..928fe5893a57 100644
> > > > > > > > > --- a/kernel/rcu/tree_exp.h
> > > > > > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > > > > > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> > > > > > > > >  	 */
> > > > > > > > >  	if (t->rcu_read_lock_nesting > 0) {
> > > > > > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > > > > > -		if (rnp->expmask & rdp->grpmask)
> > > > > > > > > +		if (rnp->expmask & rdp->grpmask) {
> > > > > > > > >  			rdp->deferred_qs = true;
> > > > > > > > > +			WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > > > > > > > +		}
> > > > > > > > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > > > > >  	}
> > > > > > > > >  
> > > > > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > > > > index 8b48bb7c224c..d6286eb6e77e 100644
> > > > > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > > > > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > > > > > >  	local_irq_save(flags);
> > > > > > > > >  	irqs_were_disabled = irqs_disabled_flags(flags);
> > > > > > > > >  	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > > > > > -	    t->rcu_read_unlock_special.b.blocked) {
> > > > > > > > > +	    t->rcu_read_unlock_special.s) {
> > > > > > > > >  		/* Need to defer quiescent state until everything is enabled. */
> > > > > > > > > +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > > > > > >  		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > > > > 
> > > > > > > > Still going through this patch, but it seems to me like the fact that
> > > > > > > > rcu_read_unlock_special is called means someone has requested for a grace
> > > > > > > > period. Then in that case, does it not make sense to raise the softirq
> > > > > > > > for processing anyway?
> > > > > > > 
> > > > > > > Not necessarily.  Another reason that rcu_read_unlock_special() might
> > > > > > > be called is if the RCU read-side critical section had been preempted,
> > > > > > > in which case there might not even be a grace period in progress.
> > > > > > 
> > > > > > Yes true, it was at the back of my head ;) It needs to remove itself from the
> > > > > > blocked lists on the unlock. And ofcourse the preemption case is alsoo
> > > > > > clearly mentioned in this function's comments. (slaps self).
> > > > > 
> > > > > Sometimes rcutorture reminds me of interesting RCU corner cases...  ;-)
> > > > > 
> > > > > > > In addition, if interrupts, bottom halves, and preemption are all enabled,
> > > > > > > the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
> > > > > > > raising softirq, as it can instead just immediately report the quiescent
> > > > > > > state.
> > > > > > 
> > > > > > Makes sense. I will go through these code paths more today. Thank you for the
> > > > > > explanations!
> > > > > > 
> > > > > > I think something like need_exp_qs instead of 'exp_hint' may be more
> > > > > > descriptive?
> > > > > 
> > > > > Well, it is only a hint due to the fact that it is not preserved across
> > > > > complex sequences of overlapping RCU read-side critical sections of
> > > > > different types.  So if you have the following sequence:
> > > > > 
> > > > > 	rcu_read_lock();
> > > > > 	/* Someone does synchronize_rcu_expedited(), which sets ->exp_hint. */
> > > > > 	preempt_disable();
> > > > > 	rcu_read_unlock(); /* Clears ->exp_hint. */
> > > > > 	preempt_enable(); /* But ->exp_hint is already cleared. */
> > > > > 
> > > > > This is OK because there will be some later event that passes the quiescent
> > > > > state to the RCU core.  This will slow down the expedited grace period,
> > > > > but this case should be uncommon.  If it does turn out to be common, then
> > > > > some more complex scheme can be put in place.
> > > > > 
> > > > > Hmmm...  This patch does need some help, doesn't it?  How about the following
> > > > > to be folded into the original?
> > > > > 
> > > > > commit d8d996385055d4708121fa253e04b4272119f5e2
> > > > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > Date:   Wed Oct 17 13:32:25 2018 -0700
> > > > > 
> > > > >     fixup! rcu: Speed up expedited GPs when interrupting RCU reader
> > > > >     
> > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > > > 
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index d6286eb6e77e..117aeb582fdc 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -650,6 +650,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > >  		local_irq_restore(flags);
> > > > >  		return;
> > > > >  	}
> > > > > +	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > > >  	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > >  }
> > > > >  
> > > > 
> > > > Sure, I believe so. I was also thinking out load about if we can avoid
> > > > raising of the softirq for some cases in rcu_read_unlock_special:
> > > > 
> > > > For example, in rcu_read_unlock_special()
> > > > 
> > > > static void rcu_read_unlock_special(struct task_struct *t)
> > > > {
> > > > [...]
> > > > 	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > 	    t->rcu_read_unlock_special.s) {
> > > > 		/* Need to defer quiescent state until everything is enabled. */
> > > > 		raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > 		local_irq_restore(flags);
> > > > 		return;
> > > > 	}
> > > > 	rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > }
> > > > 
> > > > Instead of raising the softirq, for the case where irqs are enabled, but
> > > > preemption is disabled, can we not just do:
> > > > 
> > > > 		set_tsk_need_resched(current);
> > > > 		set_preempt_need_resched();
> > > > 
> > > > and return? Not sure the benefits of doing that are, but it seems nice to
> > > > avoid raising the softirq if possible, for benefit of real-time workloads.
> > > 
> > > This approach would work very well in the case when preemption or bottom
> > > halves were disabled, but would not handle the case where interrupts were
> > > enabled during the RCU read-side critical section, an expedited grace
> > > period started (thus setting ->exp_hint), interrupts where then disabled,
> > > and finally rcu_read_unlock() was invoked.  Re-enabling interrupts would
> > > not cause either softirq or the scheduler to do anything, so the end of
> > > the expedited grace period might be delayed for some time, for example,
> > > until the next scheduling-clock interrupt.
> > > 
> > > But please see below.
> > > 
> > > > Also it seems like there is a chance the softirq might run before the
> > > > preemption is reenabled anyway right?
> > > 
> > > Not unless the rcu_read_unlock() is invoked from within a softirq
> > > handler on the one hand or within an interrupt handler that interrupted
> > > a preempt-disable region of code.  Otherwise, because interrupts are
> > > disabled, the raise_softirq() will wake up ksoftirqd, which cannot run
> > > until both preemption and bottom halves are enabled.
> > > 
> > > > Also one last thing, in your patch - do we really need to test for
> > > > "t->rcu_read_unlock_special.s" in rcu_read_unlock_special()? AFAICT,
> > > > rcu_read_unlock_special would only be called if t->rcu_read_unlock_special.s
> > > > is set in the first place so we can drop the test for that.
> > > 
> > > Good point!
> > > 
> > > How about the following?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > static void rcu_read_unlock_special(struct task_struct *t)
> > > {
> > > 	unsigned long flags;
> > > 	bool preempt_bh_were_disabled =
> > > 			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> > > 	bool irqs_were_disabled;
> > > 
> > > 	/* NMI handlers cannot block and cannot safely manipulate state. */
> > > 	if (in_nmi())
> > > 		return;
> > > 
> > > 	local_irq_save(flags);
> > > 	irqs_were_disabled = irqs_disabled_flags(flags);
> > > 	if (preempt_bh_were_disabled || irqs_were_disabled) {
> > > 		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> > > 		/* Need to defer quiescent state until everything is enabled. */
> > > 		if (irqs_were_disabled) {
> > > 			raise_softirq_irqoff(RCU_SOFTIRQ);
> > > 		} else {
> > > 			set_tsk_need_resched(current);
> > > 			set_preempt_need_resched();
> > > 		}
> > 
> > Looks good to me, thanks! Maybe some code comments would be nice as well.
> > 
> > Shouldn't we also set_tsk_need_resched for the irqs_were_disabled case, so
> > that say if we are in an IRQ disabled region (local_irq_disable), then
> > ksoftirqd would run as possible once IRQs are renabled?
> 
> Last I checked, local_irq_restore() didn't check for reschedules, instead
> relying on IPIs and scheduling-clock interrupts to do its dirty work.
> Has that changed?

Yes, local_irq_restore is light weight, and does not check for reschedules.

I was thinking of case where ksoftirqd is woken up, but does not run unless
we set the NEED_RESCHED flag. But that should get set anyway since probably
ksoftirqd is of high enough priority than the currently running task..

Roughly speaking the scenario could be something like:

rcu_read_lock();
                 <-- IPI comes in for the expedited GP, sets exp_hint
local_irq_disable();
// do a bunch of stuff
rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
                         the soft irq, and wakesup softirqd.
local_irq_enable();

// Now ksoftirqd is ready to run but we don't switch into the
// scheduler for sometime because tif_need_resched() returns false and
// any cond_resched calls do nothing. So we potentially spend lots of
// time before the next scheduling event.

You think this should not be an issue?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  1:12                                         ` Steven Rostedt
@ 2018-10-19  1:27                                           ` Joel Fernandes
  0 siblings, 0 replies; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19  1:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, Oct 18, 2018 at 09:12:45PM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 17:19:32 -0700
> "Paul E. McKenney" <paulmck@linux.ibm.com> wrote:
> 
> > I figured that whoever calls preempt_enable_no_resched() is taking the
> > responsibility for permitting preemption in the near future, and if they
> > fail to do so, they will get called on it.  Hard to hide from the latency
> > tracer, after all.  ;-)
> 
> Correct, and doing a search of preempt_enable_no_resched() I see
> there's one in the ftrace ring buffer code, that was added a long time
> ago (2008) to fix a recursion bug that no longer exists, and this now
> can leak a preemption point.
> 
> I'll have to go fix that :-(

Cool! Glad you found this issue in the code while we are discussing it ;)

thanks,

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  1:26                                         ` Joel Fernandes
@ 2018-10-19  1:50                                           ` Steven Rostedt
  2018-10-19  2:25                                             ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2018-10-19  1:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, 18 Oct 2018 18:26:45 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> Yes, local_irq_restore is light weight, and does not check for reschedules.
> 
> I was thinking of case where ksoftirqd is woken up, but does not run unless
> we set the NEED_RESCHED flag. But that should get set anyway since probably
> ksoftirqd is of high enough priority than the currently running task..
> 
> Roughly speaking the scenario could be something like:
> 
> rcu_read_lock();
>                  <-- IPI comes in for the expedited GP, sets exp_hint
> local_irq_disable();
> // do a bunch of stuff
> rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
>                          the soft irq, and wakesup softirqd.

If softirqd is of higher priority than the current running task, then
the try_to_wake_up() will set NEED_RESCHED of the current task here.

-- Steve

> local_irq_enable();
> 
> // Now ksoftirqd is ready to run but we don't switch into the
> // scheduler for sometime because tif_need_resched() returns false and
> // any cond_resched calls do nothing. So we potentially spend lots of
> // time before the next scheduling event.
> 
> You think this should not be an issue?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  1:50                                           ` Steven Rostedt
@ 2018-10-19  2:25                                             ` Joel Fernandes
  2018-10-19  2:52                                               ` Steven Rostedt
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19  2:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 18:26:45 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > 
> > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > ksoftirqd is of high enough priority than the currently running task..
> > 
> > Roughly speaking the scenario could be something like:
> > 
> > rcu_read_lock();
> >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > local_irq_disable();
> > // do a bunch of stuff
> > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> >                          the soft irq, and wakesup softirqd.
> 
> If softirqd is of higher priority than the current running task, then
> the try_to_wake_up() will set NEED_RESCHED of the current task here.
> 

Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
grace periods are quite important and they should complete quickly which is
the whole reason for interrupting rcu read sections with an IPI and stuff.
IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
for possible benefit of systems where the ksoftirqd is not of higher priority
than the currently running task, and we need to run it soon on the CPU. But
I'm Ok with whatever Paul and you want to do here.

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  2:25                                             ` Joel Fernandes
@ 2018-10-19  2:52                                               ` Steven Rostedt
  2018-10-19  3:58                                                 ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2018-10-19  2:52 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, 18 Oct 2018 19:25:29 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> > On Thu, 18 Oct 2018 18:26:45 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >   
> > > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > > 
> > > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > > ksoftirqd is of high enough priority than the currently running task..
> > > 
> > > Roughly speaking the scenario could be something like:
> > > 
> > > rcu_read_lock();
> > >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > > local_irq_disable();
> > > // do a bunch of stuff
> > > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> > >                          the soft irq, and wakesup softirqd.  
> > 
> > If softirqd is of higher priority than the current running task, then
> > the try_to_wake_up() will set NEED_RESCHED of the current task here.
> >   
> 
> Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
> grace periods are quite important and they should complete quickly which is
> the whole reason for interrupting rcu read sections with an IPI and stuff.
> IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
> for possible benefit of systems where the ksoftirqd is not of higher priority
> than the currently running task, and we need to run it soon on the CPU. But
> I'm Ok with whatever Paul and you want to do here.


Setting NEED_RESCHED unconditionally wont help. Because even if we call
schedule() ksoftirqd will not be scheduled! If it's CFS nice 0, and the
current task still has quota to run, if you call schedule, you'll just
waste time calculating that the current task should still be running.
It's equivalent to calling yield() (which is why we removed all yield()
users in the kernel, because *all* of them were buggy!). This is *why*
it only calls schedule *if* softirqd is of higher priority.

-- Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  2:52                                               ` Steven Rostedt
@ 2018-10-19  3:58                                                 ` Joel Fernandes
  2018-10-19 12:07                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19  3:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, Oct 18, 2018 at 10:52:23PM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 19:25:29 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> > > On Thu, 18 Oct 2018 18:26:45 -0700
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >   
> > > > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > > > 
> > > > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > > > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > > > ksoftirqd is of high enough priority than the currently running task..
> > > > 
> > > > Roughly speaking the scenario could be something like:
> > > > 
> > > > rcu_read_lock();
> > > >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > > > local_irq_disable();
> > > > // do a bunch of stuff
> > > > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> > > >                          the soft irq, and wakesup softirqd.  
> > > 
> > > If softirqd is of higher priority than the current running task, then
> > > the try_to_wake_up() will set NEED_RESCHED of the current task here.
> > >   
> > 
> > Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
> > grace periods are quite important and they should complete quickly which is
> > the whole reason for interrupting rcu read sections with an IPI and stuff.
> > IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
> > for possible benefit of systems where the ksoftirqd is not of higher priority
> > than the currently running task, and we need to run it soon on the CPU. But
> > I'm Ok with whatever Paul and you want to do here.
> 
> 
> Setting NEED_RESCHED unconditionally wont help. Because even if we call
> schedule() ksoftirqd will not be scheduled! If it's CFS nice 0, and the
> current task still has quota to run, if you call schedule, you'll just
> waste time calculating that the current task should still be running.
> It's equivalent to calling yield() (which is why we removed all yield()
> users in the kernel, because *all* of them were buggy!). This is *why*
> it only calls schedule *if* softirqd is of higher priority.

Yes, ok. you are right the TTWU path should handle setting the NEED_RESCHED
flag or not and unconditionally setting it does not get us anything. I had to
go through the code a bit since it has been a while since I explored it.

So Paul, I'm Ok with your latest patch for the issue we discussed and don't
think much more can be done barring raising of ksofitrqd priorities :-) So I
guess the synchronize_rcu_expedited will just cope with the deal between
local_irq_enable and the next scheduling point.. :-)

thanks,

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19  3:58                                                 ` Joel Fernandes
@ 2018-10-19 12:07                                                   ` Paul E. McKenney
  2018-10-19 17:24                                                     ` Joel Fernandes
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-19 12:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Thu, Oct 18, 2018 at 08:58:44PM -0700, Joel Fernandes wrote:
> On Thu, Oct 18, 2018 at 10:52:23PM -0400, Steven Rostedt wrote:
> > On Thu, 18 Oct 2018 19:25:29 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > > On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> > > > On Thu, 18 Oct 2018 18:26:45 -0700
> > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >   
> > > > > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > > > > 
> > > > > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > > > > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > > > > ksoftirqd is of high enough priority than the currently running task..
> > > > > 
> > > > > Roughly speaking the scenario could be something like:
> > > > > 
> > > > > rcu_read_lock();
> > > > >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > > > > local_irq_disable();
> > > > > // do a bunch of stuff
> > > > > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> > > > >                          the soft irq, and wakesup softirqd.  
> > > > 
> > > > If softirqd is of higher priority than the current running task, then
> > > > the try_to_wake_up() will set NEED_RESCHED of the current task here.
> > > >   
> > > 
> > > Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
> > > grace periods are quite important and they should complete quickly which is
> > > the whole reason for interrupting rcu read sections with an IPI and stuff.
> > > IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
> > > for possible benefit of systems where the ksoftirqd is not of higher priority
> > > than the currently running task, and we need to run it soon on the CPU. But
> > > I'm Ok with whatever Paul and you want to do here.
> > 
> > 
> > Setting NEED_RESCHED unconditionally wont help. Because even if we call
> > schedule() ksoftirqd will not be scheduled! If it's CFS nice 0, and the
> > current task still has quota to run, if you call schedule, you'll just
> > waste time calculating that the current task should still be running.
> > It's equivalent to calling yield() (which is why we removed all yield()
> > users in the kernel, because *all* of them were buggy!). This is *why*
> > it only calls schedule *if* softirqd is of higher priority.
> 
> Yes, ok. you are right the TTWU path should handle setting the NEED_RESCHED
> flag or not and unconditionally setting it does not get us anything. I had to
> go through the code a bit since it has been a while since I explored it.
> 
> So Paul, I'm Ok with your latest patch for the issue we discussed and don't
> think much more can be done barring raising of ksofitrqd priorities :-) So I
> guess the synchronize_rcu_expedited will just cope with the deal between
> local_irq_enable and the next scheduling point.. :-)

Thank you both!

Indeed, real-time systems need to be configured carefully, especially if
you are crazy enough to run them under high load.  I interpreted "Ok with
your latest patch" as an Acked-by, but please let me know if that is a
misinterpretation.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19 12:07                                                   ` Paul E. McKenney
@ 2018-10-19 17:24                                                     ` Joel Fernandes
  2018-10-19 18:11                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-10-19 17:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Fri, Oct 19, 2018 at 05:07:58AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 18, 2018 at 08:58:44PM -0700, Joel Fernandes wrote:
> > On Thu, Oct 18, 2018 at 10:52:23PM -0400, Steven Rostedt wrote:
> > > On Thu, 18 Oct 2018 19:25:29 -0700
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > 
> > > > On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> > > > > On Thu, 18 Oct 2018 18:26:45 -0700
> > > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > >   
> > > > > > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > > > > > 
> > > > > > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > > > > > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > > > > > ksoftirqd is of high enough priority than the currently running task..
> > > > > > 
> > > > > > Roughly speaking the scenario could be something like:
> > > > > > 
> > > > > > rcu_read_lock();
> > > > > >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > > > > > local_irq_disable();
> > > > > > // do a bunch of stuff
> > > > > > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> > > > > >                          the soft irq, and wakesup softirqd.  
> > > > > 
> > > > > If softirqd is of higher priority than the current running task, then
> > > > > the try_to_wake_up() will set NEED_RESCHED of the current task here.
> > > > >   
> > > > 
> > > > Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
> > > > grace periods are quite important and they should complete quickly which is
> > > > the whole reason for interrupting rcu read sections with an IPI and stuff.
> > > > IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
> > > > for possible benefit of systems where the ksoftirqd is not of higher priority
> > > > than the currently running task, and we need to run it soon on the CPU. But
> > > > I'm Ok with whatever Paul and you want to do here.
> > > 
> > > 
> > > Setting NEED_RESCHED unconditionally wont help. Because even if we call
> > > schedule() ksoftirqd will not be scheduled! If it's CFS nice 0, and the
> > > current task still has quota to run, if you call schedule, you'll just
> > > waste time calculating that the current task should still be running.
> > > It's equivalent to calling yield() (which is why we removed all yield()
> > > users in the kernel, because *all* of them were buggy!). This is *why*
> > > it only calls schedule *if* softirqd is of higher priority.
> > 
> > Yes, ok. you are right the TTWU path should handle setting the NEED_RESCHED
> > flag or not and unconditionally setting it does not get us anything. I had to
> > go through the code a bit since it has been a while since I explored it.
> > 
> > So Paul, I'm Ok with your latest patch for the issue we discussed and don't
> > think much more can be done barring raising of ksofitrqd priorities :-) So I
> > guess the synchronize_rcu_expedited will just cope with the deal between
> > local_irq_enable and the next scheduling point.. :-)
> 
> Thank you both!
> 
> Indeed, real-time systems need to be configured carefully, especially if
> you are crazy enough to run them under high load.  I interpreted "Ok with
> your latest patch" as an Acked-by, but please let me know if that is a
> misinterpretation.

Yes,

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks!

- Joel


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption
  2018-10-19 17:24                                                     ` Joel Fernandes
@ 2018-10-19 18:11                                                       ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2018-10-19 18:11 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Nikolay Borisov, linux-kernel, Jonathan Corbet,
	Josh Triplett, Lai Jiangshan, linux-doc, Mathieu Desnoyers

On Fri, Oct 19, 2018 at 10:24:25AM -0700, Joel Fernandes wrote:
> On Fri, Oct 19, 2018 at 05:07:58AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 18, 2018 at 08:58:44PM -0700, Joel Fernandes wrote:
> > > On Thu, Oct 18, 2018 at 10:52:23PM -0400, Steven Rostedt wrote:
> > > > On Thu, 18 Oct 2018 19:25:29 -0700
> > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > 
> > > > > On Thu, Oct 18, 2018 at 09:50:35PM -0400, Steven Rostedt wrote:
> > > > > > On Thu, 18 Oct 2018 18:26:45 -0700
> > > > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > >   
> > > > > > > Yes, local_irq_restore is light weight, and does not check for reschedules.
> > > > > > > 
> > > > > > > I was thinking of case where ksoftirqd is woken up, but does not run unless
> > > > > > > we set the NEED_RESCHED flag. But that should get set anyway since probably
> > > > > > > ksoftirqd is of high enough priority than the currently running task..
> > > > > > > 
> > > > > > > Roughly speaking the scenario could be something like:
> > > > > > > 
> > > > > > > rcu_read_lock();
> > > > > > >                  <-- IPI comes in for the expedited GP, sets exp_hint
> > > > > > > local_irq_disable();
> > > > > > > // do a bunch of stuff
> > > > > > > rcu_read_unlock();   <-- This calls the rcu_read_unlock_special which raises
> > > > > > >                          the soft irq, and wakesup softirqd.  
> > > > > > 
> > > > > > If softirqd is of higher priority than the current running task, then
> > > > > > the try_to_wake_up() will set NEED_RESCHED of the current task here.
> > > > > >   
> > > > > 
> > > > > Yes, only *if*. On my system, ksoftirqd is CFS nice 0. I thought expedited
> > > > > grace periods are quite important and they should complete quickly which is
> > > > > the whole reason for interrupting rcu read sections with an IPI and stuff.
> > > > > IMO there should be no harm in setting NEED_RESCHED unconditionally anyway
> > > > > for possible benefit of systems where the ksoftirqd is not of higher priority
> > > > > than the currently running task, and we need to run it soon on the CPU. But
> > > > > I'm Ok with whatever Paul and you want to do here.
> > > > 
> > > > 
> > > > Setting NEED_RESCHED unconditionally wont help. Because even if we call
> > > > schedule() ksoftirqd will not be scheduled! If it's CFS nice 0, and the
> > > > current task still has quota to run, if you call schedule, you'll just
> > > > waste time calculating that the current task should still be running.
> > > > It's equivalent to calling yield() (which is why we removed all yield()
> > > > users in the kernel, because *all* of them were buggy!). This is *why*
> > > > it only calls schedule *if* softirqd is of higher priority.
> > > 
> > > Yes, ok. you are right the TTWU path should handle setting the NEED_RESCHED
> > > flag or not and unconditionally setting it does not get us anything. I had to
> > > go through the code a bit since it has been a while since I explored it.
> > > 
> > > So Paul, I'm Ok with your latest patch for the issue we discussed and don't
> > > think much more can be done barring raising of ksofitrqd priorities :-) So I
> > > guess the synchronize_rcu_expedited will just cope with the deal between
> > > local_irq_enable and the next scheduling point.. :-)
> > 
> > Thank you both!
> > 
> > Indeed, real-time systems need to be configured carefully, especially if
> > you are crazy enough to run them under high load.  I interpreted "Ok with
> > your latest patch" as an Acked-by, but please let me know if that is a
> > misinterpretation.
> 
> Yes,
> 
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Very good, pre-applied, thank you!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-10-19 18:11 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-14 21:29 [PATCH RFC] doc: rcu: remove obsolete (non-)requirement about disabling preemption Joel Fernandes (Google)
2018-10-14 23:17 ` Paul E. McKenney
2018-10-15  2:08   ` Joel Fernandes
2018-10-15  2:13     ` Joel Fernandes
2018-10-15  2:33       ` Paul E. McKenney
2018-10-15  2:47         ` Joel Fernandes
2018-10-15  2:50           ` Joel Fernandes
2018-10-15  6:05           ` Nikolay Borisov
2018-10-15 11:21             ` Paul E. McKenney
2018-10-15 19:39               ` Joel Fernandes
2018-10-15 19:54                 ` Paul E. McKenney
2018-10-15 20:15                   ` Joel Fernandes
2018-10-15 21:08                     ` Paul E. McKenney
2018-10-16 11:26                       ` Paul E. McKenney
2018-10-16 20:41                         ` Joel Fernandes
2018-10-17 16:11                           ` Paul E. McKenney
2018-10-17 18:15                             ` Joel Fernandes
2018-10-17 20:33                               ` Paul E. McKenney
2018-10-18  2:07                                 ` Joel Fernandes
2018-10-18 14:46                                   ` Paul E. McKenney
2018-10-19  0:03                                     ` Joel Fernandes
2018-10-19  0:19                                       ` Paul E. McKenney
2018-10-19  1:12                                         ` Steven Rostedt
2018-10-19  1:27                                           ` Joel Fernandes
2018-10-19  1:26                                         ` Joel Fernandes
2018-10-19  1:50                                           ` Steven Rostedt
2018-10-19  2:25                                             ` Joel Fernandes
2018-10-19  2:52                                               ` Steven Rostedt
2018-10-19  3:58                                                 ` Joel Fernandes
2018-10-19 12:07                                                   ` Paul E. McKenney
2018-10-19 17:24                                                     ` Joel Fernandes
2018-10-19 18:11                                                       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).