rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
@ 2019-08-16  2:53 Joel Fernandes (Google)
  2019-08-16  2:53 ` [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Joel Fernandes (Google) @ 2019-08-16  2:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Fernandes (Google), rcu, Paul E. McKenney, frederic

This commit fixes the issue.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0512de9ead20..322b1b57967c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
 		   !rdp->dynticks_nmi_nesting &&
 		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
 		rdp->rcu_forced_tick = true;
-		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
+		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
 	}
 	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
 			  rdp->dynticks_nmi_nesting,
@@ -898,7 +898,7 @@ void rcu_irq_enter_irqson(void)
 void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
 {
 	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
-		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
+		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
 		rdp->rcu_forced_tick = false;
 	}
 }
@@ -2123,8 +2123,9 @@ int rcutree_dead_cpu(unsigned int cpu)
 	do_nocb_deferred_wakeup(per_cpu_ptr(&rcu_data, cpu));
 
 	// Stop-machine done, so allow nohz_full to disable tick.
-	for_each_online_cpu(c)
-		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
+	for_each_online_cpu(c) {
+		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
+	}
 	return 0;
 }
 
@@ -2175,8 +2176,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 
 	/* Invoke callbacks. */
-	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
-		tick_dep_set_task(current, TICK_DEP_MASK_RCU);
+	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
+		tick_dep_set_task(current, TICK_DEP_BIT_RCU);
+	}
 	rhp = rcu_cblist_dequeue(&rcl);
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		debug_rcu_head_unqueue(rhp);
@@ -2243,8 +2245,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Re-invoke RCU core processing if there are callbacks remaining. */
 	if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
 		invoke_rcu_core();
-	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
-		tick_dep_clear_task(current, TICK_DEP_MASK_RCU);
+	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
+		tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
+	}
 }
 
 /*
@@ -3118,8 +3121,9 @@ int rcutree_online_cpu(unsigned int cpu)
 	rcutree_affinity_setting(cpu, -1);
 
 	// Stop-machine done, so allow nohz_full to disable tick.
-	for_each_online_cpu(c)
-		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
+	for_each_online_cpu(c) {
+		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
+	}
 	return 0;
 }
 
@@ -3143,8 +3147,9 @@ int rcutree_offline_cpu(unsigned int cpu)
 	rcutree_affinity_setting(cpu, cpu);
 
 	// nohz_full CPUs need the tick for stop-machine to work quickly
-	for_each_online_cpu(c)
-		tick_dep_set_cpu(c, TICK_DEP_MASK_RCU);
+	for_each_online_cpu(c) {
+		tick_dep_set_cpu(c, TICK_DEP_BIT_RCU);
+	}
 	return 0;
 }
 
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI
  2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
@ 2019-08-16  2:53 ` Joel Fernandes (Google)
  2019-08-16  2:57   ` Joel Fernandes
  2019-08-16  2:53 ` [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes (Google) @ 2019-08-16  2:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Fernandes (Google), rcu, Paul E. McKenney, frederic

Sometimes I see rcu_urgent_qs is not set. This could be when the last
IPI was a long time ago, however, the grace period just started. Set
rcu_urgent_qs so the tick can indeed be stopped.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 322b1b57967c..856d3c9f1955 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1091,6 +1091,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	if (tick_nohz_full_cpu(rdp->cpu) &&
 		   time_after(jiffies,
 			      READ_ONCE(rdp->last_fqs_resched) + jtsq * 3)) {
+		WRITE_ONCE(*ruqp, true);
 		resched_cpu(rdp->cpu);
 		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
 	}
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
  2019-08-16  2:53 ` [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
@ 2019-08-16  2:53 ` Joel Fernandes (Google)
  2019-08-16 16:24   ` Joel Fernandes
  2019-08-16  2:59 ` [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes (Google) @ 2019-08-16  2:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Fernandes (Google), rcu, Paul E. McKenney, frederic

I really cannot explain this patch, but without it, the "else if" block
just doesn't execute thus causing the tick's dep mask to not be set and
causes the tick to be turned off.

I tried various _ONCE() macros but the only thing that works is this
patch.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 856d3c9f1955..ac6bcf7614d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
 {
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	long incby = 2;
+	int dnn = rdp->dynticks_nmi_nesting;
 
 	/* Complain about underflow. */
 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
@@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
 
 		incby = 1;
 	} else if (tick_nohz_full_cpu(rdp->cpu) &&
-		   !rdp->dynticks_nmi_nesting &&
+		   !dnn &&
 		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
 		rdp->rcu_forced_tick = true;
 		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI
  2019-08-16  2:53 ` [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
@ 2019-08-16  2:57   ` Joel Fernandes
  0 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2019-08-16  2:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: rcu, Paul E. McKenney, frederic

On Thu, Aug 15, 2019 at 10:53:10PM -0400, Joel Fernandes (Google) wrote:
> Sometimes I see rcu_urgent_qs is not set. This could be when the last
> IPI was a long time ago, however, the grace period just started. Set
> rcu_urgent_qs so the tick can indeed be stopped.
Here I meant:
                                   not be stopped.

I will resend the patch as v2.
thanks,

 - Joel


> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 322b1b57967c..856d3c9f1955 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1091,6 +1091,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  	if (tick_nohz_full_cpu(rdp->cpu) &&
>  		   time_after(jiffies,
>  			      READ_ONCE(rdp->last_fqs_resched) + jtsq * 3)) {
> +		WRITE_ONCE(*ruqp, true);
>  		resched_cpu(rdp->cpu);
>  		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
>  	}
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI
  2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
  2019-08-16  2:53 ` [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
  2019-08-16  2:53 ` [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance Joel Fernandes (Google)
@ 2019-08-16  2:59 ` Joel Fernandes (Google)
  2019-08-16 17:04   ` Paul E. McKenney
  2019-08-16 17:25 ` [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Paul E. McKenney
  2019-08-19 12:38 ` Frederic Weisbecker
  4 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes (Google) @ 2019-08-16  2:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Fernandes (Google), rcu, Paul E. McKenney, frederic

Sometimes I see rcu_urgent_qs is not set. This could be when the last
IPI was a long time ago, however, the grace period just started. Set
rcu_urgent_qs so the tick can indeed not be stopped.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 322b1b57967c..856d3c9f1955 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1091,6 +1091,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	if (tick_nohz_full_cpu(rdp->cpu) &&
 		   time_after(jiffies,
 			      READ_ONCE(rdp->last_fqs_resched) + jtsq * 3)) {
+		WRITE_ONCE(*ruqp, true);
 		resched_cpu(rdp->cpu);
 		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
 	}
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16  2:53 ` [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance Joel Fernandes (Google)
@ 2019-08-16 16:24   ` Joel Fernandes
  2019-08-16 16:52     ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2019-08-16 16:24 UTC (permalink / raw)
  To: linux-kernel, Paul E. McKenney; +Cc: rcu, frederic

On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> I really cannot explain this patch, but without it, the "else if" block
> just doesn't execute thus causing the tick's dep mask to not be set and
> causes the tick to be turned off.
> 
> I tried various _ONCE() macros but the only thing that works is this
> patch.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 856d3c9f1955..ac6bcf7614d7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  {
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  	long incby = 2;
> +	int dnn = rdp->dynticks_nmi_nesting;

I believe the accidental sign extension / conversion from long to int was
giving me an illusion since things started working well. Changing the 'int
dnn' to 'long dnn' gives similar behavior as without this patch! At least I
know now. Please feel free to ignore this particular RFC patch while I debug
this more (over the weekend or early next week). The first 2 patches are
good, just ignore this one.

thanks,

 - Joel


>  
>  	/* Complain about underflow. */
>  	WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  
>  		incby = 1;
>  	} else if (tick_nohz_full_cpu(rdp->cpu) &&
> -		   !rdp->dynticks_nmi_nesting &&
> +		   !dnn &&
>  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
>  		rdp->rcu_forced_tick = true;
>  		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16 16:24   ` Joel Fernandes
@ 2019-08-16 16:52     ` Paul E. McKenney
  2019-08-16 17:07       ` Joel Fernandes
  2019-08-19 12:59       ` Frederic Weisbecker
  0 siblings, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-16 16:52 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-kernel, rcu, frederic

On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > I really cannot explain this patch, but without it, the "else if" block
> > just doesn't execute thus causing the tick's dep mask to not be set and
> > causes the tick to be turned off.
> > 
> > I tried various _ONCE() macros but the only thing that works is this
> > patch.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 856d3c9f1955..ac6bcf7614d7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> >  {
> >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >  	long incby = 2;
> > +	int dnn = rdp->dynticks_nmi_nesting;
> 
> I believe the accidental sign extension / conversion from long to int was
> giving me an illusion since things started working well. Changing the 'int
> dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> know now. Please feel free to ignore this particular RFC patch while I debug
> this more (over the weekend or early next week). The first 2 patches are
> good, just ignore this one.

Ah, good point on the type!  So you were ending up with zero due to the
low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
the "!rdp->dynticks_nmi_nesting" instead needs to be something like
"rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
it is actually worse then the earlier comparison against the constant 2.

Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
nohz_full tick on upon irq enter instead of exit").

Or would that once again cause RCU to fail to enable the tick?

								Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> >  
> >  	/* Complain about underflow. */
> >  	WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> > @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> >  
> >  		incby = 1;
> >  	} else if (tick_nohz_full_cpu(rdp->cpu) &&
> > -		   !rdp->dynticks_nmi_nesting &&
> > +		   !dnn &&
> >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> >  		rdp->rcu_forced_tick = true;
> >  		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > -- 
> > 2.23.0.rc1.153.gdeed80330f-goog
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI
  2019-08-16  2:59 ` [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
@ 2019-08-16 17:04   ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-16 17:04 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: linux-kernel, rcu, frederic

On Thu, Aug 15, 2019 at 10:59:14PM -0400, Joel Fernandes (Google) wrote:
> Sometimes I see rcu_urgent_qs is not set. This could be when the last
> IPI was a long time ago, however, the grace period just started. Set
> rcu_urgent_qs so the tick can indeed not be stopped.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Great catch, queued, thank you!  I updated the commit log as shown below,
so could you please check to see if I messed anything up?

							Thanx, Paul

------------------------------------------------------------------------

commit 1f80df1e49b29ccd605882056a0565778e100e9e
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu Aug 15 22:59:14 2019 -0400

    rcu/tree: Ensure that ->rcu_urgent_qs is set before resched IPI
    
    The RCU-specific resched_cpu() function sends a resched IPI to the
    specified CPU, which can be used to force the tick on for a given
    nohz_full CPU.  This is needed when this nohz_full CPU is looping in the
    kernel while blocking the current grace period.  However, for the tick
    to actually be forced on in all cases, that CPU's rcu_data structure's
    ->rcu_urgent_qs flag must be set beforehand.  This commit therefore
    causes rcu_implicit_dynticks_qs() to set this flag prior to invoking
    resched_cpu() on a holdout nohz_full CPU.
    
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0512de9ead20..dd0a77bffa7d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1091,6 +1091,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	if (tick_nohz_full_cpu(rdp->cpu) &&
 		   time_after(jiffies,
 			      READ_ONCE(rdp->last_fqs_resched) + jtsq * 3)) {
+		WRITE_ONCE(*ruqp, true);
 		resched_cpu(rdp->cpu);
 		WRITE_ONCE(rdp->last_fqs_resched, jiffies);
 	}

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16 16:52     ` Paul E. McKenney
@ 2019-08-16 17:07       ` Joel Fernandes
  2019-08-16 17:30         ` Paul E. McKenney
  2019-08-19 12:59       ` Frederic Weisbecker
  1 sibling, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2019-08-16 17:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, rcu, frederic

On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > I really cannot explain this patch, but without it, the "else if" block
> > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > causes the tick to be turned off.
> > > 
> > > I tried various _ONCE() macros but the only thing that works is this
> > > patch.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > >  {
> > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > >  	long incby = 2;
> > > +	int dnn = rdp->dynticks_nmi_nesting;
> > 
> > I believe the accidental sign extension / conversion from long to int was
> > giving me an illusion since things started working well. Changing the 'int
> > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > know now. Please feel free to ignore this particular RFC patch while I debug
> > this more (over the weekend or early next week). The first 2 patches are
> > good, just ignore this one.
> 
> Ah, good point on the type!  So you were ending up with zero due to the
> low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> it is actually worse then the earlier comparison against the constant 2.
> 
> Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> nohz_full tick on upon irq enter instead of exit").

I think just using doing " == DYNTICK_IRQ_NONIDLE" as you mentioned should
make it work. I'll test that soon, thanks!

I would prefer not to revert that commit, and just make the above change.
Just because I feel this is safer. Since the tick is turned off in the IRQ
exit path, I am a bit worried about timing (does the tick turn off before RCU
sees the IRQ exit, or after it?). Either way, doing it on IRQ entry makes the
question irrelevant and immune to future changes in the timing.

Would you think the check for the nesting variable is more expensive to do on
IRQ entry than exit? If so, we could discuss doing it in the exit path,
otherwise we could doing on entry with just the above change in the equality
condition.

thanks,

 - Joel

> 
> 								Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > >  
> > >  	/* Complain about underflow. */
> > >  	WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
> > > @@ -826,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > >  
> > >  		incby = 1;
> > >  	} else if (tick_nohz_full_cpu(rdp->cpu) &&
> > > -		   !rdp->dynticks_nmi_nesting &&
> > > +		   !dnn &&
> > >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > >  		rdp->rcu_forced_tick = true;
> > >  		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > > -- 
> > > 2.23.0.rc1.153.gdeed80330f-goog
> > > 
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2019-08-16  2:59 ` [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
@ 2019-08-16 17:25 ` Paul E. McKenney
  2019-08-16 17:35   ` Joel Fernandes
  2019-08-19 12:38 ` Frederic Weisbecker
  4 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-16 17:25 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: linux-kernel, rcu, frederic

On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> This commit fixes the issue.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

And I am squashing these into their respective commits with attribution.
Good eyes, thank you very much!!!

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0512de9ead20..322b1b57967c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  		   !rdp->dynticks_nmi_nesting &&
>  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
>  		rdp->rcu_forced_tick = true;
> -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
>  	}
>  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
>  			  rdp->dynticks_nmi_nesting,
> @@ -898,7 +898,7 @@ void rcu_irq_enter_irqson(void)
>  void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
>  {
>  	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
> -		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> +		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
>  		rdp->rcu_forced_tick = false;
>  	}
>  }
> @@ -2123,8 +2123,9 @@ int rcutree_dead_cpu(unsigned int cpu)
>  	do_nocb_deferred_wakeup(per_cpu_ptr(&rcu_data, cpu));
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
> -	for_each_online_cpu(c)
> -		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
> +	for_each_online_cpu(c) {
> +		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
> +	}
>  	return 0;
>  }
>  
> @@ -2175,8 +2176,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	rcu_nocb_unlock_irqrestore(rdp, flags);
>  
>  	/* Invoke callbacks. */
> -	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
> -		tick_dep_set_task(current, TICK_DEP_MASK_RCU);
> +	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
> +		tick_dep_set_task(current, TICK_DEP_BIT_RCU);
> +	}
>  	rhp = rcu_cblist_dequeue(&rcl);
>  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>  		debug_rcu_head_unqueue(rhp);
> @@ -2243,8 +2245,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	/* Re-invoke RCU core processing if there are callbacks remaining. */
>  	if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
>  		invoke_rcu_core();
> -	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
> -		tick_dep_clear_task(current, TICK_DEP_MASK_RCU);
> +	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
> +		tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
> +	}
>  }
>  
>  /*
> @@ -3118,8 +3121,9 @@ int rcutree_online_cpu(unsigned int cpu)
>  	rcutree_affinity_setting(cpu, -1);
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
> -	for_each_online_cpu(c)
> -		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
> +	for_each_online_cpu(c) {
> +		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
> +	}
>  	return 0;
>  }
>  
> @@ -3143,8 +3147,9 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	rcutree_affinity_setting(cpu, cpu);
>  
>  	// nohz_full CPUs need the tick for stop-machine to work quickly
> -	for_each_online_cpu(c)
> -		tick_dep_set_cpu(c, TICK_DEP_MASK_RCU);
> +	for_each_online_cpu(c) {
> +		tick_dep_set_cpu(c, TICK_DEP_BIT_RCU);
> +	}
>  	return 0;
>  }
>  
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16 17:07       ` Joel Fernandes
@ 2019-08-16 17:30         ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-16 17:30 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-kernel, rcu, frederic

On Fri, Aug 16, 2019 at 01:07:00PM -0400, Joel Fernandes wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > >  {
> > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > >  	long incby = 2;
> > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I think just using doing " == DYNTICK_IRQ_NONIDLE" as you mentioned should
> make it work. I'll test that soon, thanks!

My first step is indeed to add "== DYNTICK_IRQ_NONIDLE".

> I would prefer not to revert that commit, and just make the above change.
> Just because I feel this is safer. Since the tick is turned off in the IRQ
> exit path, I am a bit worried about timing (does the tick turn off before RCU
> sees the IRQ exit, or after it?). Either way, doing it on IRQ entry makes the
> question irrelevant and immune to future changes in the timing.

Well, comparing to 0x2 is probably cheaper than comparing to
0x4000000000000000 on most architectures.  Probably not a really big
deal, but this is after all the interrupt-entry fastpath.  Or is the
compiler trickier than I am giving it credit for?  (Still, seems like
comparing to 0x2 is one small instruction and to 0x4000000000000000 is
one big instruction at the very least, and probably several instructions
on many architectures.)

But to your point, if it is absolutely necessary to turn on the tick
at interrupt entry, then the larger comparison cannot be avoided.

> Would you think the check for the nesting variable is more expensive to do on
> IRQ entry than exit? If so, we could discuss doing it in the exit path,
> otherwise we could doing on entry with just the above change in the equality
> condition.

Yes, but let's see what is possible.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-16 17:25 ` [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Paul E. McKenney
@ 2019-08-16 17:35   ` Joel Fernandes
  0 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2019-08-16 17:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, rcu, frederic

On Fri, Aug 16, 2019 at 10:25:29AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> > This commit fixes the issue.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> And I am squashing these into their respective commits with attribution.
> Good eyes, thank you very much!!!

Thank you!!

 - Joel

> 							Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree.c | 29 +++++++++++++++++------------
> >  1 file changed, 17 insertions(+), 12 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 0512de9ead20..322b1b57967c 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> >  		   !rdp->dynticks_nmi_nesting &&
> >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> >  		rdp->rcu_forced_tick = true;
> > -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> >  	}
> >  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> >  			  rdp->dynticks_nmi_nesting,
> > @@ -898,7 +898,7 @@ void rcu_irq_enter_irqson(void)
> >  void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> >  {
> >  	if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
> > -		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > +		tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> >  		rdp->rcu_forced_tick = false;
> >  	}
> >  }
> > @@ -2123,8 +2123,9 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  	do_nocb_deferred_wakeup(per_cpu_ptr(&rcu_data, cpu));
> >  
> >  	// Stop-machine done, so allow nohz_full to disable tick.
> > -	for_each_online_cpu(c)
> > -		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
> > +	for_each_online_cpu(c) {
> > +		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
> > +	}
> >  	return 0;
> >  }
> >  
> > @@ -2175,8 +2176,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> >  
> >  	/* Invoke callbacks. */
> > -	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
> > -		tick_dep_set_task(current, TICK_DEP_MASK_RCU);
> > +	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
> > +		tick_dep_set_task(current, TICK_DEP_BIT_RCU);
> > +	}
> >  	rhp = rcu_cblist_dequeue(&rcl);
> >  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >  		debug_rcu_head_unqueue(rhp);
> > @@ -2243,8 +2245,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	/* Re-invoke RCU core processing if there are callbacks remaining. */
> >  	if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
> >  		invoke_rcu_core();
> > -	if (IS_ENABLED(CONFIG_NO_HZ_FULL))
> > -		tick_dep_clear_task(current, TICK_DEP_MASK_RCU);
> > +	if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
> > +		tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
> > +	}
> >  }
> >  
> >  /*
> > @@ -3118,8 +3121,9 @@ int rcutree_online_cpu(unsigned int cpu)
> >  	rcutree_affinity_setting(cpu, -1);
> >  
> >  	// Stop-machine done, so allow nohz_full to disable tick.
> > -	for_each_online_cpu(c)
> > -		tick_dep_clear_cpu(c, TICK_DEP_MASK_RCU);
> > +	for_each_online_cpu(c) {
> > +		tick_dep_clear_cpu(c, TICK_DEP_BIT_RCU);
> > +	}
> >  	return 0;
> >  }
> >  
> > @@ -3143,8 +3147,9 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  	rcutree_affinity_setting(cpu, cpu);
> >  
> >  	// nohz_full CPUs need the tick for stop-machine to work quickly
> > -	for_each_online_cpu(c)
> > -		tick_dep_set_cpu(c, TICK_DEP_MASK_RCU);
> > +	for_each_online_cpu(c) {
> > +		tick_dep_set_cpu(c, TICK_DEP_BIT_RCU);
> > +	}
> >  	return 0;
> >  }
> >  
> > -- 
> > 2.23.0.rc1.153.gdeed80330f-goog
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2019-08-16 17:25 ` [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Paul E. McKenney
@ 2019-08-19 12:38 ` Frederic Weisbecker
  2019-08-19 14:46   ` Paul E. McKenney
  4 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2019-08-19 12:38 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: linux-kernel, rcu, Paul E. McKenney

On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> This commit fixes the issue.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0512de9ead20..322b1b57967c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>  		   !rdp->dynticks_nmi_nesting &&
>  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
>  		rdp->rcu_forced_tick = true;
> -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);

Did I suggest you to use the _MASK_ value? That was a bit mean.
Sorry for all that lost debugging time :-(

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-16 16:52     ` Paul E. McKenney
  2019-08-16 17:07       ` Joel Fernandes
@ 2019-08-19 12:59       ` Frederic Weisbecker
  2019-08-19 14:22         ` Joel Fernandes
  2019-08-19 14:40         ` Paul E. McKenney
  1 sibling, 2 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2019-08-19 12:59 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Joel Fernandes, linux-kernel, rcu

On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > I really cannot explain this patch, but without it, the "else if" block
> > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > causes the tick to be turned off.
> > > 
> > > I tried various _ONCE() macros but the only thing that works is this
> > > patch.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > >  {
> > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > >  	long incby = 2;
> > > +	int dnn = rdp->dynticks_nmi_nesting;
> > 
> > I believe the accidental sign extension / conversion from long to int was
> > giving me an illusion since things started working well. Changing the 'int
> > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > know now. Please feel free to ignore this particular RFC patch while I debug
> > this more (over the weekend or early next week). The first 2 patches are
> > good, just ignore this one.
> 
> Ah, good point on the type!  So you were ending up with zero due to the
> low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> it is actually worse then the earlier comparison against the constant 2.
> 
> Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> nohz_full tick on upon irq enter instead of exit").

I can't find that patch so all I can say so far is that its title doesn't
inspire me much. Do you still need that change for some reason?

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-19 12:59       ` Frederic Weisbecker
@ 2019-08-19 14:22         ` Joel Fernandes
  2019-08-19 14:41           ` Paul E. McKenney
  2019-08-19 14:40         ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2019-08-19 14:22 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Paul E. McKenney, linux-kernel, rcu

On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > >  {
> > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > >  	long incby = 2;
> > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I can't find that patch so all I can say so far is that its title doesn't
> inspire me much. Do you still need that change for some reason?

No we don't need it. Paul's dev branch fixed it by checking DYNTICK_IRQ_NONIDLE:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=227482fd4f3ede0502b586da28a59971dfbac0b0

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-19 12:59       ` Frederic Weisbecker
  2019-08-19 14:22         ` Joel Fernandes
@ 2019-08-19 14:40         ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-19 14:40 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Joel Fernandes, linux-kernel, rcu

On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > I really cannot explain this patch, but without it, the "else if" block
> > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > causes the tick to be turned off.
> > > > 
> > > > I tried various _ONCE() macros but the only thing that works is this
> > > > patch.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > >  {
> > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > >  	long incby = 2;
> > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > 
> > > I believe the accidental sign extension / conversion from long to int was
> > > giving me an illusion since things started working well. Changing the 'int
> > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > this more (over the weekend or early next week). The first 2 patches are
> > > good, just ignore this one.
> > 
> > Ah, good point on the type!  So you were ending up with zero due to the
> > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > it is actually worse then the earlier comparison against the constant 2.
> > 
> > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > nohz_full tick on upon irq enter instead of exit").
> 
> I can't find that patch so all I can say so far is that its title doesn't
> inspire me much. Do you still need that change for some reason?

It is in -rcu branch dev, but has been rebased.  The current version
is 227482fd4f3e ("rcu: Force nohz_full tick on upon irq enter instead
of exit").

It is not yet clear to me whether this is needed or not.  I -think- that
it is not, but without it, it is possible that some chain of events would
result in the rcu_data structure's ->rcu_urgent_qs field being cleared
before the interrupt-exit code could sample it, which might possibly
result in the tick remaining off.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-19 14:22         ` Joel Fernandes
@ 2019-08-19 14:41           ` Paul E. McKenney
  2019-08-19 15:46             ` Joel Fernandes
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-19 14:41 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Frederic Weisbecker, linux-kernel, rcu

On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > > I really cannot explain this patch, but without it, the "else if" block
> > > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > > causes the tick to be turned off.
> > > > > 
> > > > > I tried various _ONCE() macros but the only thing that works is this
> > > > > patch.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > > >  {
> > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > >  	long incby = 2;
> > > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > > 
> > > > I believe the accidental sign extension / conversion from long to int was
> > > > giving me an illusion since things started working well. Changing the 'int
> > > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > > this more (over the weekend or early next week). The first 2 patches are
> > > > good, just ignore this one.
> > > 
> > > Ah, good point on the type!  So you were ending up with zero due to the
> > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > it is actually worse then the earlier comparison against the constant 2.
> > > 
> > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > nohz_full tick on upon irq enter instead of exit").
> > 
> > I can't find that patch so all I can say so far is that its title doesn't
> > inspire me much. Do you still need that change for some reason?
> 
> No we don't need it. Paul's dev branch fixed it by checking DYNTICK_IRQ_NONIDLE:
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=227482fd4f3ede0502b586da28a59971dfbac0b0

Ah, so you have tested reverting this?  If so, thank you very much!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-19 12:38 ` Frederic Weisbecker
@ 2019-08-19 14:46   ` Paul E. McKenney
  2019-08-19 16:32     ` Frederic Weisbecker
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-19 14:46 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Joel Fernandes (Google), linux-kernel, rcu

On Mon, Aug 19, 2019 at 02:38:38PM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> > This commit fixes the issue.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 29 +++++++++++++++++------------
> >  1 file changed, 17 insertions(+), 12 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 0512de9ead20..322b1b57967c 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> >  		   !rdp->dynticks_nmi_nesting &&
> >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> >  		rdp->rcu_forced_tick = true;
> > -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> 
> Did I suggest you to use the _MASK_ value? That was a bit mean.
> Sorry for all that lost debugging time :-(

At some point, I should have looked at the other calls to these
functions.  :-/

But would the following patch make sense?  This would not help for (say)
use of TICK_MASK_BIT_POSIX_TIMER instead of TICK_DEP_BIT_POSIX_TIMER, but
would help for any new values that might be added later on.  And currently
for TICK_DEP_MASK_CLOCK_UNSTABLE and TICK_DEP_MASK_RCU.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 39eb44564058..4ed788ce5762 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -111,6 +111,7 @@ enum tick_dep_bits {
 	TICK_DEP_BIT_CLOCK_UNSTABLE	= 3,
 	TICK_DEP_BIT_RCU		= 4
 };
+#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU
 
 #define TICK_DEP_MASK_NONE		0
 #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_DEP_BIT_POSIX_TIMER)
@@ -215,24 +216,28 @@ extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
  */
 static inline void tick_dep_set(enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_set(bit);
 }
 
 static inline void tick_dep_clear(enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_clear(bit);
 }
 
 static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_cpu(cpu))
 		tick_nohz_dep_set_cpu(cpu, bit);
 }
 
 static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_cpu(cpu))
 		tick_nohz_dep_clear_cpu(cpu, bit);
 }
@@ -240,24 +245,28 @@ static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
 static inline void tick_dep_set_task(struct task_struct *tsk,
 				     enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_set_task(tsk, bit);
 }
 static inline void tick_dep_clear_task(struct task_struct *tsk,
 				       enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_clear_task(tsk, bit);
 }
 static inline void tick_dep_set_signal(struct signal_struct *signal,
 				       enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_set_signal(signal, bit);
 }
 static inline void tick_dep_clear_signal(struct signal_struct *signal,
 					 enum tick_dep_bits bit)
 {
+	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
 	if (tick_nohz_full_enabled())
 		tick_nohz_dep_clear_signal(signal, bit);
 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-19 14:41           ` Paul E. McKenney
@ 2019-08-19 15:46             ` Joel Fernandes
  2019-08-19 16:17               ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2019-08-19 15:46 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Frederic Weisbecker, linux-kernel, rcu

On Mon, Aug 19, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> > On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > > > I really cannot explain this patch, but without it, the "else if" block
> > > > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > > > causes the tick to be turned off.
> > > > > > 
> > > > > > I tried various _ONCE() macros but the only thing that works is this
> > > > > > patch.
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  kernel/rcu/tree.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > > > >  {
> > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > >  	long incby = 2;
> > > > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > > > 
> > > > > I believe the accidental sign extension / conversion from long to int was
> > > > > giving me an illusion since things started working well. Changing the 'int
> > > > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > > > this more (over the weekend or early next week). The first 2 patches are
> > > > > good, just ignore this one.
> > > > 
> > > > Ah, good point on the type!  So you were ending up with zero due to the
> > > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > > it is actually worse then the earlier comparison against the constant 2.
> > > > 
> > > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > > nohz_full tick on upon irq enter instead of exit").
> > > 
> > > I can't find that patch so all I can say so far is that its title doesn't
> > > inspire me much. Do you still need that change for some reason?
> > 
> > No we don't need it. Paul's dev branch fixed it by checking DYNTICK_IRQ_NONIDLE:
> > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=227482fd4f3ede0502b586da28a59971dfbac0b0
> 
> Ah, so you have tested reverting this?  If so, thank you very much!

Just tried reverting, and found a bug if done in the reverted way. Sent you
email with a proposed change which is essentially the top of tree:
https://github.com/joelagnel/linux-kernel/commits/rcu/nohz-test-3

Also for Frederick, I wanted to mention why my pure hack above (dnn variable)
seemed to work. The reason was because of long to int conversion of
rdp->dynticks_nmi_nesting which I surprisingly did not get a compiler warning
for. dynticks_nmi_nesting getting converted to int was truncating the
DYNTICK_IRQ_NONIDLE bit (in fact I believe this was due to the cltq
instruction in x86). This caused the "else if" condition to always evaluate
to true and turn off the tick.

Paul, I wanted to see if I can create a repeatable test case for this issue.
Not a full blown RCU torture test, but something that one could run and get a
PASS or FAIL. Do you think this could be useful? And what is the best place
for such a test?
Essentially the test would be:
1. Run a test and dump some traces.
2. Parse the traces and see if things are sane (such as the tick not turning
   off for this issue).
3. Report pass or fail.

The other way instead of parsing traces could be, a kernel module that does
trace_probe_register on various tracepoints and tries to see if the tick
indeed could stay turned on. Then report pass/fail at the end of the module's
execution.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance
  2019-08-19 15:46             ` Joel Fernandes
@ 2019-08-19 16:17               ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-19 16:17 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Frederic Weisbecker, linux-kernel, rcu

On Mon, Aug 19, 2019 at 11:46:36AM -0400, Joel Fernandes wrote:
> On Mon, Aug 19, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 19, 2019 at 10:22:08AM -0400, Joel Fernandes wrote:
> > > On Mon, Aug 19, 2019 at 02:59:08PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Aug 16, 2019 at 09:52:42AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Aug 16, 2019 at 12:24:04PM -0400, Joel Fernandes wrote:
> > > > > > On Thu, Aug 15, 2019 at 10:53:11PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > I really cannot explain this patch, but without it, the "else if" block
> > > > > > > just doesn't execute thus causing the tick's dep mask to not be set and
> > > > > > > causes the tick to be turned off.
> > > > > > > 
> > > > > > > I tried various _ONCE() macros but the only thing that works is this
> > > > > > > patch.
> > > > > > > 
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > ---
> > > > > > >  kernel/rcu/tree.c | 3 ++-
> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 856d3c9f1955..ac6bcf7614d7 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -802,6 +802,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > > > > >  {
> > > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > >  	long incby = 2;
> > > > > > > +	int dnn = rdp->dynticks_nmi_nesting;
> > > > > > 
> > > > > > I believe the accidental sign extension / conversion from long to int was
> > > > > > giving me an illusion since things started working well. Changing the 'int
> > > > > > dnn' to 'long dnn' gives similar behavior as without this patch! At least I
> > > > > > know now. Please feel free to ignore this particular RFC patch while I debug
> > > > > > this more (over the weekend or early next week). The first 2 patches are
> > > > > > good, just ignore this one.
> > > > > 
> > > > > Ah, good point on the type!  So you were ending up with zero due to the
> > > > > low-order 32 bits of DYNTICK_IRQ_NONIDLE being zero, correct?  If so,
> > > > > the "!rdp->dynticks_nmi_nesting" instead needs to be something like
> > > > > "rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE", which sounds like
> > > > > it is actually worse then the earlier comparison against the constant 2.
> > > > > 
> > > > > Sounds like I should revert the -rcu commit 805a16eaefc3 ("rcu: Force
> > > > > nohz_full tick on upon irq enter instead of exit").
> > > > 
> > > > I can't find that patch so all I can say so far is that its title doesn't
> > > > inspire me much. Do you still need that change for some reason?
> > > 
> > > No we don't need it. Paul's dev branch fixed it by checking DYNTICK_IRQ_NONIDLE:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=227482fd4f3ede0502b586da28a59971dfbac0b0
> > 
> > Ah, so you have tested reverting this?  If so, thank you very much!
> 
> Just tried reverting, and found a bug if done in the reverted way. Sent you
> email with a proposed change which is essentially the top of tree:
> https://github.com/joelagnel/linux-kernel/commits/rcu/nohz-test-3
> 
> Also for Frederick, I wanted to mention why my pure hack above (dnn variable)
> seemed to work. The reason was because of long to int conversion of
> rdp->dynticks_nmi_nesting which I surprisingly did not get a compiler warning
> for. dynticks_nmi_nesting getting converted to int was truncating the
> DYNTICK_IRQ_NONIDLE bit (in fact I believe this was due to the cltq
> instruction in x86). This caused the "else if" condition to always evaluate
> to true and turn off the tick.
> 
> Paul, I wanted to see if I can create a repeatable test case for this issue.
> Not a full blown RCU torture test, but something that one could run and get a
> PASS or FAIL. Do you think this could be useful? And what is the best place
> for such a test?
> Essentially the test would be:
> 1. Run a test and dump some traces.
> 2. Parse the traces and see if things are sane (such as the tick not turning
>    off for this issue).
> 3. Report pass or fail.
> 
> The other way instead of parsing traces could be, a kernel module that does
> trace_probe_register on various tracepoints and tries to see if the tick
> indeed could stay turned on. Then report pass/fail at the end of the module's
> execution.

Or you could increment a per-CPU counter in rcu_sched_clock_irq() and use
that to verify the tick.  Maybe you could use the existing ->ticks_this_gp,
though that does get zeroed at the beginning of each grace period, which
would make sampling it a bit trickier.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-19 14:46   ` Paul E. McKenney
@ 2019-08-19 16:32     ` Frederic Weisbecker
  2019-08-19 16:44       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2019-08-19 16:32 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Joel Fernandes (Google), linux-kernel, rcu

On Mon, Aug 19, 2019 at 07:46:32AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 19, 2019 at 02:38:38PM +0200, Frederic Weisbecker wrote:
> > On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> > > This commit fixes the issue.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 29 +++++++++++++++++------------
> > >  1 file changed, 17 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 0512de9ead20..322b1b57967c 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > >  		   !rdp->dynticks_nmi_nesting &&
> > >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > >  		rdp->rcu_forced_tick = true;
> > > -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > > +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > 
> > Did I suggest you to use the _MASK_ value? That was a bit mean.
> > Sorry for all that lost debugging time :-(
> 
> At some point, I should have looked at the other calls to these
> functions.  :-/
> 
> But would the following patch make sense?  This would not help for (say)
> use of TICK_MASK_BIT_POSIX_TIMER instead of TICK_DEP_BIT_POSIX_TIMER, but
> would help for any new values that might be added later on.  And currently
> for TICK_DEP_MASK_CLOCK_UNSTABLE and TICK_DEP_MASK_RCU.

I'd rather make the TICK_DEP_MASK_* values private to kernel/time/tick-sched.c but
that means I need to re-arrange a bit include/trace/events/timer.h

I'm looking into it. Meanwhile, your below patch that checks for the max value is
still valuable.

Thanks.

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 39eb44564058..4ed788ce5762 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -111,6 +111,7 @@ enum tick_dep_bits {
>  	TICK_DEP_BIT_CLOCK_UNSTABLE	= 3,
>  	TICK_DEP_BIT_RCU		= 4
>  };
> +#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU
>  
>  #define TICK_DEP_MASK_NONE		0
>  #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_DEP_BIT_POSIX_TIMER)
> @@ -215,24 +216,28 @@ extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
>   */
>  static inline void tick_dep_set(enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_set(bit);
>  }
>  
>  static inline void tick_dep_clear(enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_clear(bit);
>  }
>  
>  static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_cpu(cpu))
>  		tick_nohz_dep_set_cpu(cpu, bit);
>  }
>  
>  static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_cpu(cpu))
>  		tick_nohz_dep_clear_cpu(cpu, bit);
>  }
> @@ -240,24 +245,28 @@ static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
>  static inline void tick_dep_set_task(struct task_struct *tsk,
>  				     enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_set_task(tsk, bit);
>  }
>  static inline void tick_dep_clear_task(struct task_struct *tsk,
>  				       enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_clear_task(tsk, bit);
>  }
>  static inline void tick_dep_set_signal(struct signal_struct *signal,
>  				       enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_set_signal(signal, bit);
>  }
>  static inline void tick_dep_clear_signal(struct signal_struct *signal,
>  					 enum tick_dep_bits bit)
>  {
> +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
>  	if (tick_nohz_full_enabled())
>  		tick_nohz_dep_clear_signal(signal, bit);
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-19 16:32     ` Frederic Weisbecker
@ 2019-08-19 16:44       ` Paul E. McKenney
  2019-08-20 12:08         ` Frederic Weisbecker
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-19 16:44 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Joel Fernandes (Google), linux-kernel, rcu

On Mon, Aug 19, 2019 at 06:32:27PM +0200, Frederic Weisbecker wrote:
> On Mon, Aug 19, 2019 at 07:46:32AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 19, 2019 at 02:38:38PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Aug 15, 2019 at 10:53:09PM -0400, Joel Fernandes (Google) wrote:
> > > > This commit fixes the issue.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 29 +++++++++++++++++------------
> > > >  1 file changed, 17 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 0512de9ead20..322b1b57967c 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -829,7 +829,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > > >  		   !rdp->dynticks_nmi_nesting &&
> > > >  		   rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > > >  		rdp->rcu_forced_tick = true;
> > > > -		tick_dep_set_cpu(rdp->cpu, TICK_DEP_MASK_RCU);
> > > > +		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > > 
> > > Did I suggest you to use the _MASK_ value? That was a bit mean.
> > > Sorry for all that lost debugging time :-(
> > 
> > At some point, I should have looked at the other calls to these
> > functions.  :-/
> > 
> > But would the following patch make sense?  This would not help for (say)
> > use of TICK_MASK_BIT_POSIX_TIMER instead of TICK_DEP_BIT_POSIX_TIMER, but
> > would help for any new values that might be added later on.  And currently
> > for TICK_DEP_MASK_CLOCK_UNSTABLE and TICK_DEP_MASK_RCU.
> 
> I'd rather make the TICK_DEP_MASK_* values private to kernel/time/tick-sched.c but
> that means I need to re-arrange a bit include/trace/events/timer.h

That would be even better!  For one thing, it would detect misuse of
-all- of the _MASK_ values.  ;-)

> I'm looking into it. Meanwhile, your below patch that checks for the max value is
> still valuable.

If I were to push it, it would be v5.5 before it showed up.  My guess
is therefore that I should keep it for my own internal use in the near
term, but not push it.  If you would like to take it, feel free to use
my Signed-off-by.

							Thanx, Paul

> Thanks.
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 39eb44564058..4ed788ce5762 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -111,6 +111,7 @@ enum tick_dep_bits {
> >  	TICK_DEP_BIT_CLOCK_UNSTABLE	= 3,
> >  	TICK_DEP_BIT_RCU		= 4
> >  };
> > +#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU
> >  
> >  #define TICK_DEP_MASK_NONE		0
> >  #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_DEP_BIT_POSIX_TIMER)
> > @@ -215,24 +216,28 @@ extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
> >   */
> >  static inline void tick_dep_set(enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_set(bit);
> >  }
> >  
> >  static inline void tick_dep_clear(enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_clear(bit);
> >  }
> >  
> >  static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_cpu(cpu))
> >  		tick_nohz_dep_set_cpu(cpu, bit);
> >  }
> >  
> >  static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_cpu(cpu))
> >  		tick_nohz_dep_clear_cpu(cpu, bit);
> >  }
> > @@ -240,24 +245,28 @@ static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
> >  static inline void tick_dep_set_task(struct task_struct *tsk,
> >  				     enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_set_task(tsk, bit);
> >  }
> >  static inline void tick_dep_clear_task(struct task_struct *tsk,
> >  				       enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_clear_task(tsk, bit);
> >  }
> >  static inline void tick_dep_set_signal(struct signal_struct *signal,
> >  				       enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_set_signal(signal, bit);
> >  }
> >  static inline void tick_dep_clear_signal(struct signal_struct *signal,
> >  					 enum tick_dep_bits bit)
> >  {
> > +	WARN_ON_ONCE(bit > TICK_DEP_BIT_MAX);
> >  	if (tick_nohz_full_enabled())
> >  		tick_nohz_dep_clear_signal(signal, bit);
> >  }
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-19 16:44       ` Paul E. McKenney
@ 2019-08-20 12:08         ` Frederic Weisbecker
  2019-08-20 14:44           ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2019-08-20 12:08 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Joel Fernandes (Google), linux-kernel, rcu

On Mon, Aug 19, 2019 at 09:44:20AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 19, 2019 at 06:32:27PM +0200, Frederic Weisbecker wrote:
> > > But would the following patch make sense?  This would not help for (say)
> > > use of TICK_MASK_BIT_POSIX_TIMER instead of TICK_DEP_BIT_POSIX_TIMER, but
> > > would help for any new values that might be added later on.  And currently
> > > for TICK_DEP_MASK_CLOCK_UNSTABLE and TICK_DEP_MASK_RCU.
> > 
> > I'd rather make the TICK_DEP_MASK_* values private to kernel/time/tick-sched.c but
> > that means I need to re-arrange a bit include/trace/events/timer.h
> 
> That would be even better!  For one thing, it would detect misuse of
> -all- of the _MASK_ values.  ;-)

:o)

> 
> > I'm looking into it. Meanwhile, your below patch that checks for the max value is
> > still valuable.
> 
> If I were to push it, it would be v5.5 before it showed up.  My guess
> is therefore that I should keep it for my own internal use in the near
> term, but not push it.  If you would like to take it, feel free to use
> my Signed-off-by.

Ok, applying.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks
  2019-08-20 12:08         ` Frederic Weisbecker
@ 2019-08-20 14:44           ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2019-08-20 14:44 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Joel Fernandes (Google), linux-kernel, rcu

On Tue, Aug 20, 2019 at 02:08:45PM +0200, Frederic Weisbecker wrote:
> On Mon, Aug 19, 2019 at 09:44:20AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 19, 2019 at 06:32:27PM +0200, Frederic Weisbecker wrote:
> > > > But would the following patch make sense?  This would not help for (say)
> > > > use of TICK_MASK_BIT_POSIX_TIMER instead of TICK_DEP_BIT_POSIX_TIMER, but
> > > > would help for any new values that might be added later on.  And currently
> > > > for TICK_DEP_MASK_CLOCK_UNSTABLE and TICK_DEP_MASK_RCU.
> > > 
> > > I'd rather make the TICK_DEP_MASK_* values private to kernel/time/tick-sched.c but
> > > that means I need to re-arrange a bit include/trace/events/timer.h
> > 
> > That would be even better!  For one thing, it would detect misuse of
> > -all- of the _MASK_ values.  ;-)
> 
> :o)
> 
> > 
> > > I'm looking into it. Meanwhile, your below patch that checks for the max value is
> > > still valuable.
> > 
> > If I were to push it, it would be v5.5 before it showed up.  My guess
> > is therefore that I should keep it for my own internal use in the near
> > term, but not push it.  If you would like to take it, feel free to use
> > my Signed-off-by.
> 
> Ok, applying.

Thank you, Frederic!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-08-20 14:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  2:53 [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Joel Fernandes (Google)
2019-08-16  2:53 ` [PATCH -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
2019-08-16  2:57   ` Joel Fernandes
2019-08-16  2:53 ` [PATCH -rcu dev 3/3] RFC: rcu/tree: Read dynticks_nmi_nesting in advance Joel Fernandes (Google)
2019-08-16 16:24   ` Joel Fernandes
2019-08-16 16:52     ` Paul E. McKenney
2019-08-16 17:07       ` Joel Fernandes
2019-08-16 17:30         ` Paul E. McKenney
2019-08-19 12:59       ` Frederic Weisbecker
2019-08-19 14:22         ` Joel Fernandes
2019-08-19 14:41           ` Paul E. McKenney
2019-08-19 15:46             ` Joel Fernandes
2019-08-19 16:17               ` Paul E. McKenney
2019-08-19 14:40         ` Paul E. McKenney
2019-08-16  2:59 ` [PATCH v2 -rcu dev 2/3] rcu/tree: Fix issue where sometimes rcu_urgent_qs is not set on IPI Joel Fernandes (Google)
2019-08-16 17:04   ` Paul E. McKenney
2019-08-16 17:25 ` [PATCH -rcu dev 1/3] rcu/tree: tick_dep_set/clear_cpu should accept bits instead of masks Paul E. McKenney
2019-08-16 17:35   ` Joel Fernandes
2019-08-19 12:38 ` Frederic Weisbecker
2019-08-19 14:46   ` Paul E. McKenney
2019-08-19 16:32     ` Frederic Weisbecker
2019-08-19 16:44       ` Paul E. McKenney
2019-08-20 12:08         ` Frederic Weisbecker
2019-08-20 14:44           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).