Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")

From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Petr Mladek <pmladek@suse.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Yafang Shao <laoar.shao@gmail.com>
Subject: Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
Date: Thu, 5 Sep 2019 20:01:37 -0400	[thread overview]
Message-ID: <20190906000137.GA224720@google.com> (raw)
In-Reply-To: <20190905164329.GT4125@linux.ibm.com>

On Thu, Sep 05, 2019 at 09:43:29AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 05, 2019 at 11:36:20AM -0400, Joel Fernandes wrote:
> > On Wed, Sep 04, 2019 at 04:13:08PM -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 04, 2019 at 09:54:20AM -0400, Joel Fernandes wrote:
> > > > On Wed, Sep 04, 2019 at 03:12:10AM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Sep 04, 2019 at 12:59:10AM -0400, Joel Fernandes wrote:
> > > > > > On Tue, Sep 03, 2019 at 01:02:49PM -0700, Paul E. McKenney wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > If this task gets delayed betweentimes, rcu_implicit_dynticks_qs() would
> > > > > fail to set .rcu_need_heavy_qs because it saw it already being set,
> > > > > even though the corresponding ->dynticks update had already happened.
> > > > > (It might be a new grace period, given that the old grace period might
> > > > > have ended courtesy of the atomic_add_return().)
> > > > 
> > > > Makes sense and I agree.
> > > > 
> > > > Also, I would really appreciate if you can correct the nits in the above
> > > > patch we're reviewing, and apply them (if you can).
> > > > I think, there are only 2 changes left:
> > > > - rename special to seq.
> > > > - reorder the rcu_need_heavy_qs write.
> > > > 
> > > >  On a related point, when I was working on the NOHZ_FULL testing I noticed a
> > > >  weird issue where rcu_urgent_qs was reset but rcu_need_heavy_qs was still
> > > >  set indefinitely. I am a bit afraid our hints are not being cleared
> > > >  appropriately and I believe I fixed a similar issue a few months ago. I
> > > >  would rather have them cleared once they are no longer needed.  What do you
> > > >  think about the below patch? I did not submit it yet because I was working
> > > >  on other patches. 
> > > > 
> > > > ---8<-----------------------
> > > > 
> > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > Subject: [RFC] rcu/tree: Reset CPU hints when reporting a quiescent state
> > > > 
> > > > While tracing, I am seeing cases where need_heavy_qs is still set even
> > > > though urgent_qs was cleared, after a quiescent state is reported. One
> > > > such case is when the softirq reports that a CPU has passed quiescent
> > > > state.
> > > > 
> > > > Previously in 671a63517cf9 ("rcu: Avoid unnecessary softirq when system
> > > > is idle"), I had fixed a bug where core_needs_qs was not being cleared.
> > > > I worry we keep running into similar situations. Let us just add a
> > > > function to clear hints and call it from all relevant places to make the
> > > > code more robust and avoid such stale hints which could in theory at
> > > > least, cause false hints after the quiescent state was already reported.
> > > > 
> > > > Tested overnight with rcutorture running for 60 minutes on all
> > > > configurations of RCU.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 17 ++++++++++++++++-
> > > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > Excellent point!  But how about if we combine it with the existing
> > > disabling of the scheduler tick, perhaps something like the following?
> > > 
> > > Note that the FQS clearing can come from some other CPU, hence the added
> > > {READ,WRITE}_ONCE() calls.  The call is moved down in rcu_report_qs_rdp()
> > > because something would have had to clear the bit to prevent execution
> > > from getting there, and I believe that the other bit-clearing events
> > > have calls to rcu_disable_urgency_upon_qs().  (But I easily could have
> > > missed something!)
> > 
> > Is there any harm just clearing it earlier in rcu_report_qs_rdp()? If no,
> > then let us just play it safe and do it that way (clear earlier in
> > rcu_report_qs_rdp())?
> 
> Maybe...
> 
> But given that missing a path doesn't cause a major failure (too-short
> grace period, for example), I am more inclined to find the paths and
> fix them as needed.  Especially given that my ignorance of any path to
> a quiescent state likely hides a serious bug.

Ok that's fine.

> > And I am guessing the __this_cpu_read(rcu_data.core_needs_qs) in
> > rcu_flavor_sched_clock_irq() implies READ_ONCE() so no need READ_ONCE()
> > there right?
> 
> Assembly in x86.  Not so much on other architectures, though.  ;-)
> See raw_cpu_generic_read().

Interesting. That one seems like a plain access, I wonder why it does not use
READ_ONCE() in there or volatile accesses.

> > > @@ -3004,7 +3007,7 @@ static int rcu_pending(void)
> > >  		return 0;
> > >  
> > >  	/* Is the RCU core waiting for a quiescent state from this CPU? */
> > > -	if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm)
> > > +	if (READ_ONCE(rdp->core_needs_qs) && !rdp->cpu_no_qs.b.norm)
> > >  		return 1;
> > >  
> > >  	/* Does this CPU have callbacks ready to invoke? */
> > > @@ -3244,7 +3247,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > >  	rdp->gp_seq = rnp->gp_seq;
> > >  	rdp->gp_seq_needed = rnp->gp_seq;
> > >  	rdp->cpu_no_qs.b.norm = true;
> > > -	rdp->core_needs_qs = false;
> > 
> > How about calling the new hint-clearing function here as well? Just for
> > robustness and consistency purposes?
> 
> This and the next function are both called during a CPU-hotplug online
> operation, so there is little robustness or consistency to be had by
> doing it twice.

Ok, sorry I missed you are clearing it below in the next function. That's
fine with me.

This patch looks good to me and I am Ok with merging of these changes into
the original patch with my authorship as you mentioned. Or if you wanted to
be author, that's fine too :)

Let me know anything else needed with it, thanks!

 - Joel

> > thanks,
> > 
> >  - Joel
> > 
> > >  	rdp->rcu_iw_pending = false;
> > >  	rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
> > >  	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > @@ -3359,7 +3361,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > >  	rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > >  	rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > >  	if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > -		rcu_disable_tick_upon_qs(rdp);
> > > +		rcu_disable_urgency_upon_qs(rdp);
> > >  		/* Report QS -after- changing ->qsmaskinitnext! */
> > >  		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > >  	} else {