[tip/core/rcu,1/3] rcu: Provide polling interfaces for Tree RCU grace periods
diff mbox series

Message ID 20210304002632.23870-1-paulmck@kernel.org
State New, archived
Headers show
Series
  • Polling RCU grace-period interfaces for v5.13
Related show

Commit Message

Paul E. McKenney March 4, 2021, 12:26 a.m. UTC
From: "Paul E. McKenney" <paulmck@kernel.org>

There is a need for a non-blocking polling interface for RCU grace
periods, so this commit supplies start_poll_synchronize_rcu() and
poll_state_synchronize_rcu() for this purpose.  Note that the existing
get_state_synchronize_rcu() may be used if future grace periods are
inevitable (perhaps due to a later call_rcu() invocation).  The new
start_poll_synchronize_rcu() is to be used if future grace periods
might not otherwise happen.  Finally, poll_state_synchronize_rcu()
provides a lockless check for a grace period having elapsed since
the corresponding call to either of the get_state_synchronize_rcu()
or start_poll_synchronize_rcu().

As with get_state_synchronize_rcu(), the return value from either
get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
to a later call to either poll_state_synchronize_rcu() or the existing
(might_sleep) cond_synchronize_rcu().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rcutree.h |  2 ++
 kernel/rcu/tree.c       | 70 +++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 67 insertions(+), 5 deletions(-)

Comments

Frederic Weisbecker March 12, 2021, 12:21 p.m. UTC | #1
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There is a need for a non-blocking polling interface for RCU grace
> periods, so this commit supplies start_poll_synchronize_rcu() and
> poll_state_synchronize_rcu() for this purpose.  Note that the existing
> get_state_synchronize_rcu() may be used if future grace periods are
> inevitable (perhaps due to a later call_rcu() invocation).  The new
> start_poll_synchronize_rcu() is to be used if future grace periods
> might not otherwise happen.

By future grace period, you mean if a grace period has been started right
_before_ we start polling, right?


> Finally, poll_state_synchronize_rcu()
> provides a lockless check for a grace period having elapsed since
> the corresponding call to either of the get_state_synchronize_rcu()
> or start_poll_synchronize_rcu().
> 
> As with get_state_synchronize_rcu(), the return value from either
> get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> to a later call to either poll_state_synchronize_rcu() or the existing
> (might_sleep) cond_synchronize_rcu().
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[...]
>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> +	unsigned long flags;
> +	unsigned long gp_seq = get_state_synchronize_rcu();
> +	bool needwake;
> +	struct rcu_data *rdp;
> +	struct rcu_node *rnp;
> +
> +	lockdep_assert_irqs_enabled();
> +	local_irq_save(flags);
> +	rdp = this_cpu_ptr(&rcu_data);
> +	rnp = rdp->mynode;
> +	raw_spin_lock_rcu_node(rnp); // irqs already disabled.
> +	needwake = rcu_start_this_gp(rnp, rdp, gp_seq);

I'm a bit surprised we don't start a new grace period instead of snapshotting
the current one.

So if we do this:

   //start grace period gp_num=5

   old = p;
   rcu_assign_pointer(p, new);

   num = start_poll_synchronize_rcu(); // num = 5

   //grace period ends, start new gp_num=6

   poll_state_synchronize_rcu(num); // rcu seq is done

   kfree(old);

Isn't there a risk that other CPUs still see the old pointer?

Of course I know I'm missing something obvious :-)

Thanks.
Frederic Weisbecker March 12, 2021, 12:26 p.m. UTC | #2
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> +	unsigned long flags;
> +	unsigned long gp_seq = get_state_synchronize_rcu();

Ah! It's using rcu_seq_snap() and not rcu_seq_current() and therefore it's
waiting for a safe future grace period, right?

If so, please discard my previous email.

Thanks.
Paul E. McKenney March 15, 2021, 11:11 p.m. UTC | #3
On Fri, Mar 12, 2021 at 01:26:47PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> >  /**
> > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > + *
> > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> > + * or poll_state_synchronize_rcu() to determine whether or not a full
> > + * grace period has elapsed in the meantime.  If the needed grace period
> > + * is not already slated to start, notifies RCU core of the need for that
> > + * grace period.
> > + *
> > + * Interrupts must be enabled for the case where it is necessary to awaken
> > + * the grace-period kthread.
> > + */
> > +unsigned long start_poll_synchronize_rcu(void)
> > +{
> > +	unsigned long flags;
> > +	unsigned long gp_seq = get_state_synchronize_rcu();
> 
> Ah! It's using rcu_seq_snap() and not rcu_seq_current() and therefore it's
> waiting for a safe future grace period, right?

Exactly!  ;-)

							Thanx, Paul

> If so, please discard my previous email.
> 
> Thanks.
Frederic Weisbecker March 16, 2021, 2:47 p.m. UTC | #4
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> + *
> + * Yes, this function does not take counter wrap into account.
> + * But counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!).
> + * Those needing to keep oldstate values for very long time periods
> + * (many hours even on 32-bit systems) should check them occasionally
> + * and either refresh them or set a flag indicating that the grace period
> + * has completed.
> + */
> +bool poll_state_synchronize_rcu(unsigned long oldstate)
> +{
> +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> +		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> +		return true;
> +	}
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> +
> +/**
>   * cond_synchronize_rcu - Conditionally wait for an RCU grace period
>   *
>   * @oldstate: return value from earlier call to get_state_synchronize_rcu()
>   *
>   * If a full RCU grace period has elapsed since the earlier call to
> - * get_state_synchronize_rcu(), just return.  Otherwise, invoke
> - * synchronize_rcu() to wait for a full grace period.
> + * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
>   *
>   * Yes, this function does not take counter wrap into account.  But
>   * counter wrap is harmless.  If the counter wraps, we have waited for
> @@ -3804,7 +3864,7 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
>   */
>  void cond_synchronize_rcu(unsigned long oldstate)
>  {
> -	if (!rcu_seq_done(&rcu_state.gp_seq, oldstate))
> +	if (!poll_state_synchronize_rcu(oldstate))
>  		synchronize_rcu();
>  	else
>  		smp_mb(); /* Ensure GP ends before subsequent accesses. */

poll_state_synchronize_rcu() already does the full barrier.

> -- 
> 2.9.5
>
Frederic Weisbecker March 16, 2021, 3:17 p.m. UTC | #5
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> + *
> + * Yes, this function does not take counter wrap into account.
> + * But counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!).
> + * Those needing to keep oldstate values for very long time periods
> + * (many hours even on 32-bit systems) should check them occasionally
> + * and either refresh them or set a flag indicating that the grace period
> + * has completed.
> + */
> +bool poll_state_synchronize_rcu(unsigned long oldstate)
> +{
> +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> +		smp_mb(); /* Ensure GP ends before subsequent accesses. */

Also as usual I'm a bit lost with the reason behind those memory barriers.
So this is ordering the read on rcu_state.gp_seq against something (why not an
smp_rmb() btw?). And what does it pair with?

Thanks.
Paul E. McKenney March 16, 2021, 4:42 p.m. UTC | #6
On Tue, Mar 16, 2021 at 03:47:22PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > + *
> > + * Yes, this function does not take counter wrap into account.
> > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > + * Those needing to keep oldstate values for very long time periods
> > + * (many hours even on 32-bit systems) should check them occasionally
> > + * and either refresh them or set a flag indicating that the grace period
> > + * has completed.
> > + */
> > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > +{
> > +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> > +		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > +		return true;
> > +	}
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> > +
> > +/**
> >   * cond_synchronize_rcu - Conditionally wait for an RCU grace period
> >   *
> >   * @oldstate: return value from earlier call to get_state_synchronize_rcu()
> >   *
> >   * If a full RCU grace period has elapsed since the earlier call to
> > - * get_state_synchronize_rcu(), just return.  Otherwise, invoke
> > - * synchronize_rcu() to wait for a full grace period.
> > + * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> >   *
> >   * Yes, this function does not take counter wrap into account.  But
> >   * counter wrap is harmless.  If the counter wraps, we have waited for
> > @@ -3804,7 +3864,7 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
> >   */
> >  void cond_synchronize_rcu(unsigned long oldstate)
> >  {
> > -	if (!rcu_seq_done(&rcu_state.gp_seq, oldstate))
> > +	if (!poll_state_synchronize_rcu(oldstate))
> >  		synchronize_rcu();
> >  	else
> >  		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> 
> poll_state_synchronize_rcu() already does the full barrier.

Right you are!  Good catch, thank you!!!

I will fold removing this smp_mb() in with attribution.

							Thanx, Paul
Paul E. McKenney March 16, 2021, 4:51 p.m. UTC | #7
On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > + *
> > + * Yes, this function does not take counter wrap into account.
> > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > + * Those needing to keep oldstate values for very long time periods
> > + * (many hours even on 32-bit systems) should check them occasionally
> > + * and either refresh them or set a flag indicating that the grace period
> > + * has completed.
> > + */
> > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > +{
> > +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> > +		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> 
> Also as usual I'm a bit lost with the reason behind those memory barriers.
> So this is ordering the read on rcu_state.gp_seq against something (why not an
> smp_rmb() btw?). And what does it pair with?

Because it needs to order subsequent writes as well as reads.

It is ordering whatever the RCU user wishes to put after the call to
poll_state_synchronize_rcu() with whatever the RCU user put before
whatever started the grace period that just now completed.  Please
see the synchronize_rcu() comment header for the statement of the
guarantee.  Or that of call_rcu().

For more detail on how these guarantees are implemented, please see
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
and its many diagrams.

There are a lot of memory barriers that pair and form larger cycles to
implement this guarantee.  Pretty much all of the calls to the infamous
smp_mb__after_unlock_lock() macro form cycles involving this barrier,
for example.

Please do not hesitate to ask more questions.  This underpins RCU.

							Thanx, Paul
Frederic Weisbecker March 18, 2021, 2:59 p.m. UTC | #8
On Tue, Mar 16, 2021 at 09:51:01AM -0700, Paul E. McKenney wrote:
> On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> > On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> > > +/**
> > > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > > + *
> > > + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> > > + *
> > > + * If a full RCU grace period has elapsed since the earlier call from
> > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > + *
> > > + * Yes, this function does not take counter wrap into account.
> > > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > > + * Those needing to keep oldstate values for very long time periods
> > > + * (many hours even on 32-bit systems) should check them occasionally
> > > + * and either refresh them or set a flag indicating that the grace period
> > > + * has completed.
> > > + */
> > > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > > +{
> > > +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> > > +		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > 
> > Also as usual I'm a bit lost with the reason behind those memory barriers.
> > So this is ordering the read on rcu_state.gp_seq against something (why not an
> > smp_rmb() btw?). And what does it pair with?
> 
> Because it needs to order subsequent writes as well as reads.
> 
> It is ordering whatever the RCU user wishes to put after the call to
> poll_state_synchronize_rcu() with whatever the RCU user put before
> whatever started the grace period that just now completed.  Please
> see the synchronize_rcu() comment header for the statement of the
> guarantee.  Or that of call_rcu().

I see. OTOH the update side's CPU had to report a quiescent state for the
requested grace period to complete. As the quiescent state propagated along
with full ordering up to the root rnp, everything that happened before
rcu_seq_done() should appear before and everything that happened after
rcu_seq_done() should appear after.

Now in the case the update side's CPU is not the last CPU that reported
a quiescent state (and thus not the one that propagated every subsequent
CPUs QS to the final "rcu_state.gp_seq"), the full barrier after rcu_seq_done()
is necessary to order against all the CPUs that reported a QS after the
update side's CPU.

Is that right?


> 
> For more detail on how these guarantees are implemented, please see
> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> and its many diagrams.

Indeed, very useful documentation!

> 
> There are a lot of memory barriers that pair and form larger cycles to
> implement this guarantee.  Pretty much all of the calls to the infamous
> smp_mb__after_unlock_lock() macro form cycles involving this barrier,
> for example.
> 
> Please do not hesitate to ask more questions.  This underpins RCU.

Careful what you wish! ;-)

Thanks.
Paul E. McKenney March 18, 2021, 5:09 p.m. UTC | #9
On Thu, Mar 18, 2021 at 03:59:52PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 09:51:01AM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> > > > +/**
> > > > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > > > + *
> > > > + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> > > > + *
> > > > + * If a full RCU grace period has elapsed since the earlier call from
> > > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > > + *
> > > > + * Yes, this function does not take counter wrap into account.
> > > > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > > > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > > > + * Those needing to keep oldstate values for very long time periods
> > > > + * (many hours even on 32-bit systems) should check them occasionally
> > > > + * and either refresh them or set a flag indicating that the grace period
> > > > + * has completed.
> > > > + */
> > > > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > > > +{
> > > > +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> > > > +		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > > 
> > > Also as usual I'm a bit lost with the reason behind those memory barriers.
> > > So this is ordering the read on rcu_state.gp_seq against something (why not an
> > > smp_rmb() btw?). And what does it pair with?
> > 
> > Because it needs to order subsequent writes as well as reads.
> > 
> > It is ordering whatever the RCU user wishes to put after the call to
> > poll_state_synchronize_rcu() with whatever the RCU user put before
> > whatever started the grace period that just now completed.  Please
> > see the synchronize_rcu() comment header for the statement of the
> > guarantee.  Or that of call_rcu().
> 
> I see. OTOH the update side's CPU had to report a quiescent state for the
> requested grace period to complete. As the quiescent state propagated along
> with full ordering up to the root rnp, everything that happened before
> rcu_seq_done() should appear before and everything that happened after
> rcu_seq_done() should appear after.
> 
> Now in the case the update side's CPU is not the last CPU that reported
> a quiescent state (and thus not the one that propagated every subsequent
> CPUs QS to the final "rcu_state.gp_seq"), the full barrier after rcu_seq_done()
> is necessary to order against all the CPUs that reported a QS after the
> update side's CPU.
> 
> Is that right?

That is the way I see it.  ;-)

> > For more detail on how these guarantees are implemented, please see
> > Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > and its many diagrams.
> 
> Indeed, very useful documentation!

Glad you like it!

> > There are a lot of memory barriers that pair and form larger cycles to
> > implement this guarantee.  Pretty much all of the calls to the infamous
> > smp_mb__after_unlock_lock() macro form cycles involving this barrier,
> > for example.
> > 
> > Please do not hesitate to ask more questions.  This underpins RCU.
> 
> Careful what you wish! ;-)

;-) ;-) ;-)

							Thanx, Paul
Frederic Weisbecker March 19, 2021, 1:58 p.m. UTC | #10
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There is a need for a non-blocking polling interface for RCU grace
> periods, so this commit supplies start_poll_synchronize_rcu() and
> poll_state_synchronize_rcu() for this purpose.  Note that the existing
> get_state_synchronize_rcu() may be used if future grace periods are
> inevitable (perhaps due to a later call_rcu() invocation).  The new
> start_poll_synchronize_rcu() is to be used if future grace periods
> might not otherwise happen.  Finally, poll_state_synchronize_rcu()
> provides a lockless check for a grace period having elapsed since
> the corresponding call to either of the get_state_synchronize_rcu()
> or start_poll_synchronize_rcu().
> 
> As with get_state_synchronize_rcu(), the return value from either
> get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> to a later call to either poll_state_synchronize_rcu() or the existing
> (might_sleep) cond_synchronize_rcu().

It's all a matter of personal taste but if I may suggest some namespace
modifications:

get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
poll_state_synchronize_rcu() -> synchronize_rcu_poll()
cond_synchronize_rcu() -> synchronize_rcu_cond()

But it's up to you really.

>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()

It may be worth noting that calling start_poll_synchronize_rcu() and then
pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
one more grace period.

> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> +	unsigned long flags;
> +	unsigned long gp_seq = get_state_synchronize_rcu();
> +	bool needwake;
> +	struct rcu_data *rdp;
> +	struct rcu_node *rnp;
[...]
> +
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.

Rephrase suggestion for the last sentence:

"In case of failure, it's up to the caller to try polling again later or
invoke synchronize_rcu() to wait for a new full grace period to complete."


In any case: Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!
Paul E. McKenney March 19, 2021, 5:51 p.m. UTC | #11
On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There is a need for a non-blocking polling interface for RCU grace
> > periods, so this commit supplies start_poll_synchronize_rcu() and
> > poll_state_synchronize_rcu() for this purpose.  Note that the existing
> > get_state_synchronize_rcu() may be used if future grace periods are
> > inevitable (perhaps due to a later call_rcu() invocation).  The new
> > start_poll_synchronize_rcu() is to be used if future grace periods
> > might not otherwise happen.  Finally, poll_state_synchronize_rcu()
> > provides a lockless check for a grace period having elapsed since
> > the corresponding call to either of the get_state_synchronize_rcu()
> > or start_poll_synchronize_rcu().
> > 
> > As with get_state_synchronize_rcu(), the return value from either
> > get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> > to a later call to either poll_state_synchronize_rcu() or the existing
> > (might_sleep) cond_synchronize_rcu().
> 
> It's all a matter of personal taste but if I may suggest some namespace
> modifications:
> 
> get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> cond_synchronize_rcu() -> synchronize_rcu_cond()
> 
> But it's up to you really.

I am concerned about starting anything "synchronize_rcu" if that
thing doesn't unconditionally wait for a grace period.  "What do
you mean that there was no grace period?  Don't you see that call to
synchronize_rcu_poll_start_raw()???"

This objection doesn't apply to cond_synchronize_rcu(), but it is
already in use, so any name change should be worked with the users.
All two of them.  ;-)

> >  /**
> > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > + *
> > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> 
> It may be worth noting that calling start_poll_synchronize_rcu() and then
> pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
> one more grace period.

You mean this sequence of events?

1.	cookie = start_poll_synchronize_rcu()

2.	The grace period corresponding to cookie is almost over...

3.	cond_synchronize_rcu() checks the cookie and sees that the
	grace period has not yet expired.

4.	The grace period corresponding to cookie completes.

5.	Someone else starts a grace period.

6.	cond_synchronize_rcu() invokes synchronize_rcu(), which waits
	for the just-started grace period plus another grace period.
	Thus, there has been no fewer than three full grace periods
	between the call to start_poll_synchronize_rcu() and the
	return from cond_synchronize_rcu().

Yes, this can happen!  And it can be worse, for example, it is quite
possible that cond_synchronize_rcu() would be preempted for multiple
grace periods at step 5, in which case it would still wait for almost
two additional grace periods.

Or are you thinking of something else?

> > + * or poll_state_synchronize_rcu() to determine whether or not a full
> > + * grace period has elapsed in the meantime.  If the needed grace period
> > + * is not already slated to start, notifies RCU core of the need for that
> > + * grace period.
> > + *
> > + * Interrupts must be enabled for the case where it is necessary to awaken
> > + * the grace-period kthread.
> > + */
> > +unsigned long start_poll_synchronize_rcu(void)
> > +{
> > +	unsigned long flags;
> > +	unsigned long gp_seq = get_state_synchronize_rcu();
> > +	bool needwake;
> > +	struct rcu_data *rdp;
> > +	struct rcu_node *rnp;
> [...]
> > +
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> 
> Rephrase suggestion for the last sentence:
> 
> "In case of failure, it's up to the caller to try polling again later or
> invoke synchronize_rcu() to wait for a new full grace period to complete."

How about like this?

/**
 * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
 *
 * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
 *
 * If a full RCU grace period has elapsed since the earlier call from
 * which oldstate was obtained, return @true, otherwise return @false.
 * If @false is returned, it is the caller's responsibilty to invoke this
 * function later on until it does return @true.  Alternatively, the caller
 * can explicitly wait for a grace period, for example, by passing @oldstate
 * to cond_synchronize_rcu() or by directly invoking synchronize_rcu().
 *
 * Yes, this function does not take counter wrap into account.
 * But counter wrap is harmless.  If the counter wraps, we have waited for
 * more than 2 billion grace periods (and way more on a 64-bit system!).
 * Those needing to keep oldstate values for very long time periods
 * (many hours even on 32-bit systems) should check them occasionally
 * and either refresh them or set a flag indicating that the grace period
 * has completed.
 */

> In any case: Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thank you, I will apply it at the next rebase.

							Thanx, Paul
Frederic Weisbecker March 19, 2021, 10:10 p.m. UTC | #12
On Fri, Mar 19, 2021 at 10:51:16AM -0700, Paul E. McKenney wrote:
> On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> > It's all a matter of personal taste but if I may suggest some namespace
> > modifications:
> > 
> > get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> > start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> > poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> > cond_synchronize_rcu() -> synchronize_rcu_cond()
> > 
> > But it's up to you really.
> 
> I am concerned about starting anything "synchronize_rcu" if that
> thing doesn't unconditionally wait for a grace period.  "What do
> you mean that there was no grace period?  Don't you see that call to
> synchronize_rcu_poll_start_raw()???"

I see, that could indeed be confusing.

> 
> This objection doesn't apply to cond_synchronize_rcu(), but it is
> already in use, so any name change should be worked with the users.
> All two of them.  ;-)

Probably not worth it. We have cond_resched() as a schedule() counterpart
for a reference after all.

> > >  /**
> > > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > > + *
> > > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> > 
> > It may be worth noting that calling start_poll_synchronize_rcu() and then
> > pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
> > one more grace period.
> 
> You mean this sequence of events?
> 
> 1.	cookie = start_poll_synchronize_rcu()
> 
> 2.	The grace period corresponding to cookie is almost over...
> 
> 3.	cond_synchronize_rcu() checks the cookie and sees that the
> 	grace period has not yet expired.
> 
> 4.	The grace period corresponding to cookie completes.
> 
> 5.	Someone else starts a grace period.
> 
> 6.	cond_synchronize_rcu() invokes synchronize_rcu(), which waits
> 	for the just-started grace period plus another grace period.
> 	Thus, there has been no fewer than three full grace periods
> 	between the call to start_poll_synchronize_rcu() and the
> 	return from cond_synchronize_rcu().
> 
> Yes, this can happen!  And it can be worse, for example, it is quite
> possible that cond_synchronize_rcu() would be preempted for multiple
> grace periods at step 5, in which case it would still wait for almost
> two additional grace periods.
> 
> Or are you thinking of something else?

I didn't even think that far.
My scenario was:

1.	cookie = start_poll_synchronize_rcu()
 
 
2.	cond_synchronize_rcu() checks the cookie and sees that the
	grace period has not yet expired. So it calls synchronize_rcu()
	which queues a callback.

3.	The grace period for the cookie eventually completes.

4.	The callback queued in 2. gets assigned a new grace period number.
	That new grace period starts.

5.	The new grace period completes and synchronize_rcu() returns.


But I think this is due to some deep misunderstanding from my end.


> > > + * If a full RCU grace period has elapsed since the earlier call from
> > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > 
> > Rephrase suggestion for the last sentence:
> > 
> > "In case of failure, it's up to the caller to try polling again later or
> > invoke synchronize_rcu() to wait for a new full grace period to complete."
> 
> How about like this?
> 
> /**
>  * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
>  *
>  * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
>  *
>  * If a full RCU grace period has elapsed since the earlier call from
>  * which oldstate was obtained, return @true, otherwise return @false.
>  * If @false is returned, it is the caller's responsibilty to invoke this
>  * function later on until it does return @true.  Alternatively, the caller
>  * can explicitly wait for a grace period, for example, by passing @oldstate
>  * to cond_synchronize_rcu() or by directly invoking synchronize_rcu().

Yes very nice!

Thanks!
Paul E. McKenney March 19, 2021, 11:38 p.m. UTC | #13
On Fri, Mar 19, 2021 at 11:10:40PM +0100, Frederic Weisbecker wrote:
> On Fri, Mar 19, 2021 at 10:51:16AM -0700, Paul E. McKenney wrote:
> > On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> > > It's all a matter of personal taste but if I may suggest some namespace
> > > modifications:
> > > 
> > > get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> > > start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> > > poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> > > cond_synchronize_rcu() -> synchronize_rcu_cond()
> > > 
> > > But it's up to you really.
> > 
> > I am concerned about starting anything "synchronize_rcu" if that
> > thing doesn't unconditionally wait for a grace period.  "What do
> > you mean that there was no grace period?  Don't you see that call to
> > synchronize_rcu_poll_start_raw()???"
> 
> I see, that could indeed be confusing.
> 
> > This objection doesn't apply to cond_synchronize_rcu(), but it is
> > already in use, so any name change should be worked with the users.
> > All two of them.  ;-)
> 
> Probably not worth it. We have cond_resched() as a schedule() counterpart
> for a reference after all.

Good point!

> > > >  /**
> > > > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > > > + *
> > > > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> > > 
> > > It may be worth noting that calling start_poll_synchronize_rcu() and then
> > > pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
> > > one more grace period.
> > 
> > You mean this sequence of events?
> > 
> > 1.	cookie = start_poll_synchronize_rcu()
> > 
> > 2.	The grace period corresponding to cookie is almost over...
> > 
> > 3.	cond_synchronize_rcu() checks the cookie and sees that the
> > 	grace period has not yet expired.
> > 
> > 4.	The grace period corresponding to cookie completes.
> > 
> > 5.	Someone else starts a grace period.
> > 
> > 6.	cond_synchronize_rcu() invokes synchronize_rcu(), which waits
> > 	for the just-started grace period plus another grace period.
> > 	Thus, there has been no fewer than three full grace periods
> > 	between the call to start_poll_synchronize_rcu() and the
> > 	return from cond_synchronize_rcu().
> > 
> > Yes, this can happen!  And it can be worse, for example, it is quite
> > possible that cond_synchronize_rcu() would be preempted for multiple
> > grace periods at step 5, in which case it would still wait for almost
> > two additional grace periods.
> > 
> > Or are you thinking of something else?
> 
> I didn't even think that far.
> My scenario was:
> 
> 1.	cookie = start_poll_synchronize_rcu()
>  
>  
> 2.	cond_synchronize_rcu() checks the cookie and sees that the
> 	grace period has not yet expired. So it calls synchronize_rcu()
> 	which queues a callback.
> 
> 3.	The grace period for the cookie eventually completes.
> 
> 4.	The callback queued in 2. gets assigned a new grace period number.
> 	That new grace period starts.
> 
> 5.	The new grace period completes and synchronize_rcu() returns.
> 
> 
> But I think this is due to some deep misunderstanding from my end.

You mean like this?

	oldstate = start_poll_synchronize_rcu();
	// Why wait?  Beat the rush!!!
	cond_synchronize_rcu(oldstate);

This would be a bit silly (why not just call synchronize_rcu()?),
and yes, this would unconditionally get you an extra RCU grace period.
Then again, any call to cond_synchronize_rcu() before the desired grace
period has expired will get you an extra grace period, and maybe more.

So a given use case either needs to not care about the added latency
or have a high probability of invoking cond_synchronize_rcu() after
the desired grace period has expired.

> > > > + * If a full RCU grace period has elapsed since the earlier call from
> > > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > 
> > > Rephrase suggestion for the last sentence:
> > > 
> > > "In case of failure, it's up to the caller to try polling again later or
> > > invoke synchronize_rcu() to wait for a new full grace period to complete."
> > 
> > How about like this?
> > 
> > /**
> >  * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> >  *
> >  * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
> >  *
> >  * If a full RCU grace period has elapsed since the earlier call from
> >  * which oldstate was obtained, return @true, otherwise return @false.
> >  * If @false is returned, it is the caller's responsibilty to invoke this
> >  * function later on until it does return @true.  Alternatively, the caller
> >  * can explicitly wait for a grace period, for example, by passing @oldstate
> >  * to cond_synchronize_rcu() or by directly invoking synchronize_rcu().
> 
> Yes very nice!

You got it!

							Thanx, Paul
Frederic Weisbecker March 19, 2021, 11:47 p.m. UTC | #14
On Fri, Mar 19, 2021 at 04:38:48PM -0700, Paul E. McKenney wrote:
> > I didn't even think that far.
> > My scenario was:
> > 
> > 1.	cookie = start_poll_synchronize_rcu()
> >  
> >  
> > 2.	cond_synchronize_rcu() checks the cookie and sees that the
> > 	grace period has not yet expired. So it calls synchronize_rcu()
> > 	which queues a callback.
> > 
> > 3.	The grace period for the cookie eventually completes.
> > 
> > 4.	The callback queued in 2. gets assigned a new grace period number.
> > 	That new grace period starts.
> > 
> > 5.	The new grace period completes and synchronize_rcu() returns.
> > 
> > 
> > But I think this is due to some deep misunderstanding from my end.
> 
> You mean like this?
> 
> 	oldstate = start_poll_synchronize_rcu();
> 	// Why wait?  Beat the rush!!!
> 	cond_synchronize_rcu(oldstate);
> 
> This would be a bit silly (why not just call synchronize_rcu()?),
> and yes, this would unconditionally get you an extra RCU grace period.
> Then again, any call to cond_synchronize_rcu() before the desired grace
> period has expired will get you an extra grace period, and maybe more.
> 
> So a given use case either needs to not care about the added latency
> or have a high probability of invoking cond_synchronize_rcu() after
> the desired grace period has expired.

Fair point!

Thanks.

Patch
diff mbox series

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index df578b7..b89b541 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -41,6 +41,8 @@  void rcu_momentary_dyntick_idle(void);
 void kfree_rcu_scheduler_running(void);
 bool rcu_gp_might_be_stalled(void);
 unsigned long get_state_synchronize_rcu(void);
+unsigned long start_poll_synchronize_rcu(void);
+bool poll_state_synchronize_rcu(unsigned long oldstate);
 void cond_synchronize_rcu(unsigned long oldstate);
 
 void rcu_idle_enter(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da6f521..8e0a140 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3774,8 +3774,8 @@  EXPORT_SYMBOL_GPL(synchronize_rcu);
  * get_state_synchronize_rcu - Snapshot current RCU state
  *
  * Returns a cookie that is used by a later call to cond_synchronize_rcu()
- * to determine whether or not a full grace period has elapsed in the
- * meantime.
+ * or poll_state_synchronize_rcu() to determine whether or not a full
+ * grace period has elapsed in the meantime.
  */
 unsigned long get_state_synchronize_rcu(void)
 {
@@ -3789,13 +3789,73 @@  unsigned long get_state_synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
 
 /**
+ * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
+ *
+ * Returns a cookie that is used by a later call to cond_synchronize_rcu()
+ * or poll_state_synchronize_rcu() to determine whether or not a full
+ * grace period has elapsed in the meantime.  If the needed grace period
+ * is not already slated to start, notifies RCU core of the need for that
+ * grace period.
+ *
+ * Interrupts must be enabled for the case where it is necessary to awaken
+ * the grace-period kthread.
+ */
+unsigned long start_poll_synchronize_rcu(void)
+{
+	unsigned long flags;
+	unsigned long gp_seq = get_state_synchronize_rcu();
+	bool needwake;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+
+	lockdep_assert_irqs_enabled();
+	local_irq_save(flags);
+	rdp = this_cpu_ptr(&rcu_data);
+	rnp = rdp->mynode;
+	raw_spin_lock_rcu_node(rnp); // irqs already disabled.
+	needwake = rcu_start_this_gp(rnp, rdp, gp_seq);
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	if (needwake)
+		rcu_gp_kthread_wake();
+	return gp_seq;
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
+
+/**
+ * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
+ *
+ * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu()
+ *
+ * If a full RCU grace period has elapsed since the earlier call from
+ * which oldstate was obtained, return @true, otherwise return @false.
+ * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
+ *
+ * Yes, this function does not take counter wrap into account.
+ * But counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!).
+ * Those needing to keep oldstate values for very long time periods
+ * (many hours even on 32-bit systems) should check them occasionally
+ * and either refresh them or set a flag indicating that the grace period
+ * has completed.
+ */
+bool poll_state_synchronize_rcu(unsigned long oldstate)
+{
+	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
+		smp_mb(); /* Ensure GP ends before subsequent accesses. */
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
+
+/**
  * cond_synchronize_rcu - Conditionally wait for an RCU grace period
  *
  * @oldstate: return value from earlier call to get_state_synchronize_rcu()
  *
  * If a full RCU grace period has elapsed since the earlier call to
- * get_state_synchronize_rcu(), just return.  Otherwise, invoke
- * synchronize_rcu() to wait for a full grace period.
+ * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return.
+ * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
  *
  * Yes, this function does not take counter wrap into account.  But
  * counter wrap is harmless.  If the counter wraps, we have waited for
@@ -3804,7 +3864,7 @@  EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
  */
 void cond_synchronize_rcu(unsigned long oldstate)
 {
-	if (!rcu_seq_done(&rcu_state.gp_seq, oldstate))
+	if (!poll_state_synchronize_rcu(oldstate))
 		synchronize_rcu();
 	else
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */