tasks-trace RCU: question about grace period forward progress

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* tasks-trace RCU: question about grace period forward progress
@ 2021-02-25 14:22 Mathieu Desnoyers
  2021-02-25 15:36 ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-02-25 14:22 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

Hi Paul,

Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this:

static inline void rcu_read_lock_trace(void)
{
        struct task_struct *t = current;

        WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
        barrier();
        if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
            t->trc_reader_special.b.need_mb)
                smp_mb(); // Pairs with update-side barriers
        rcu_lock_acquire(&rcu_trace_lock_map);
}

static inline void rcu_read_unlock_trace(void)
{
        int nesting;
        struct task_struct *t = current;

        rcu_lock_release(&rcu_trace_lock_map);
        nesting = READ_ONCE(t->trc_reader_nesting) - 1;
        barrier(); // Critical section before disabling.
        // Disable IPI-based setting of .need_qs.
        WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
        if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
                WRITE_ONCE(t->trc_reader_nesting, nesting);
                return;  // We assume shallow reader nesting.
        }
        rcu_read_unlock_trace_special(t, nesting);
}

AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical
section with a counter, and grace periods iterate over all threads to make sure they
are not within a read-side critical section before they can complete:

# define rcu_tasks_trace_qs(t)                                          \
        do {                                                            \
                if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
                    !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
                        smp_store_release(&(t)->trc_reader_checked, true); \
                        smp_mb(); /* Readers partitioned by store. */   \
                }                                                       \
        } while (0)

It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
state to track whether threads are nested within a critical section:

https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125

static inline void _urcu_mb_read_lock_update(unsigned long tmp)
{
	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
		cmm_smp_mb();
	} else
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
}

static inline void _urcu_mb_read_lock(void)
{
	unsigned long tmp;

	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
	cmm_barrier();
	tmp = URCU_TLS(urcu_mb_reader).ctr;
	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
	_urcu_mb_read_lock_update(tmp);
}

The main difference between the two algorithm is that task-trace within the
kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
incremented or flipped between 0 and 1 by the grace period. This allow RCU readers
outermost nesting starting after the beginning of the grace period not to prevent
progress of the grace period.

Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
grace period from ever completing.

Or is this handled in a clever way that I am missing here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 14:22 tasks-trace RCU: question about grace period forward progress Mathieu Desnoyers
@ 2021-02-25 15:36 ` Paul E. McKenney
  2021-02-25 15:47   ` Mathieu Desnoyers
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2021-02-25 15:36 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote:
> Hi Paul,
> 
> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this:
> 
> static inline void rcu_read_lock_trace(void)
> {
>         struct task_struct *t = current;
> 
>         WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
>         barrier();
>         if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
>             t->trc_reader_special.b.need_mb)
>                 smp_mb(); // Pairs with update-side barriers
>         rcu_lock_acquire(&rcu_trace_lock_map);
> }
> 
> static inline void rcu_read_unlock_trace(void)
> {
>         int nesting;
>         struct task_struct *t = current;
> 
>         rcu_lock_release(&rcu_trace_lock_map);
>         nesting = READ_ONCE(t->trc_reader_nesting) - 1;
>         barrier(); // Critical section before disabling.
>         // Disable IPI-based setting of .need_qs.
>         WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
>         if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
>                 WRITE_ONCE(t->trc_reader_nesting, nesting);
>                 return;  // We assume shallow reader nesting.
>         }
>         rcu_read_unlock_trace_special(t, nesting);
> }
> 
> AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical
> section with a counter, and grace periods iterate over all threads to make sure they
> are not within a read-side critical section before they can complete:
> 
> # define rcu_tasks_trace_qs(t)                                          \
>         do {                                                            \
>                 if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
>                     !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
>                         smp_store_release(&(t)->trc_reader_checked, true); \
>                         smp_mb(); /* Readers partitioned by store. */   \
>                 }                                                       \
>         } while (0)
> 
> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
> state to track whether threads are nested within a critical section:
> 
> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
> 
> static inline void _urcu_mb_read_lock_update(unsigned long tmp)
> {
> 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
> 		cmm_smp_mb();
> 	} else
> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
> }
> 
> static inline void _urcu_mb_read_lock(void)
> {
> 	unsigned long tmp;
> 
> 	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
> 	cmm_barrier();
> 	tmp = URCU_TLS(urcu_mb_reader).ctr;
> 	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
> 	_urcu_mb_read_lock_update(tmp);
> }
> 
> The main difference between the two algorithm is that task-trace within the
> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
> incremented or flipped between 0 and 1 by the grace period. This allow RCU readers
> outermost nesting starting after the beginning of the grace period not to prevent
> progress of the grace period.
> 
> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
> grace period from ever completing.
> 
> Or is this handled in a clever way that I am missing here ?

There are several mechanisms designed to handle this.  The following
paragraphs describe these at a high level.

The trc_wait_for_one_reader() is invoked on each task.  It uses the
try_invoke_on_locked_down_task(), which, if the task is currently not
running, keeps it that way and invokes trc_inspect_reader().  If the
locked-down task is in a read-side critical section, the need_qs field
is set, which will cause the task's next rcu_read_lock_trace() to report
the quiescent state.

If read-side memory barriers have been enabled, trc_inspect_reader()
is able to check for a reader being active, and if not, reports the
quiescent state.  If there is a reader, trc_inspect_reader() reports
failure, which is another path to the following paragraph.

If the task could not be locked down due its currently running,
then trc_wait_for_one_reader() attempts to send an IPI, which results in
trc_read_check_handler() rechecking for a read-side critical section
and either reporting the quiescent state immediately or proceding in the
same way that trc_inspect_reader() does.  The trc_read_check_handler()
of course checks to make sure that the target task is still running
before doing anything.  If the attempt to send the IPI fails, then
the task is rechecked in a later pass.

So what sequence of events did you find that causes these mechanisms
to fail?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 15:36 ` Paul E. McKenney
@ 2021-02-25 15:47   ` Mathieu Desnoyers
  2021-02-25 18:33     ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-02-25 15:47 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

----- On Feb 25, 2021, at 10:36 AM, paulmck paulmck@kernel.org wrote:

> On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote:
>> Hi Paul,
>> 
>> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(),
>> and I see this:
>> 
>> static inline void rcu_read_lock_trace(void)
>> {
>>         struct task_struct *t = current;
>> 
>>         WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
>>         barrier();
>>         if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
>>             t->trc_reader_special.b.need_mb)
>>                 smp_mb(); // Pairs with update-side barriers
>>         rcu_lock_acquire(&rcu_trace_lock_map);
>> }
>> 
>> static inline void rcu_read_unlock_trace(void)
>> {
>>         int nesting;
>>         struct task_struct *t = current;
>> 
>>         rcu_lock_release(&rcu_trace_lock_map);
>>         nesting = READ_ONCE(t->trc_reader_nesting) - 1;
>>         barrier(); // Critical section before disabling.
>>         // Disable IPI-based setting of .need_qs.
>>         WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
>>         if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
>>                 WRITE_ONCE(t->trc_reader_nesting, nesting);
>>                 return;  // We assume shallow reader nesting.
>>         }
>>         rcu_read_unlock_trace_special(t, nesting);
>> }
>> 
>> AFAIU, each thread keeps track of whether it is nested within a RCU read-side
>> critical
>> section with a counter, and grace periods iterate over all threads to make sure
>> they
>> are not within a read-side critical section before they can complete:
>> 
>> # define rcu_tasks_trace_qs(t)                                          \
>>         do {                                                            \
>>                 if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
>>                     !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
>>                         smp_store_release(&(t)->trc_reader_checked, true); \
>>                         smp_mb(); /* Readers partitioned by store. */   \
>>                 }                                                       \
>>         } while (0)
>> 
>> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
>> state to track whether threads are nested within a critical section:
>> 
>> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
>> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
>> 
>> static inline void _urcu_mb_read_lock_update(unsigned long tmp)
>> {
>> 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
>> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr,
>> 		_CMM_LOAD_SHARED(urcu_mb_gp.ctr));
>> 		cmm_smp_mb();
>> 	} else
>> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
>> }
>> 
>> static inline void _urcu_mb_read_lock(void)
>> {
>> 	unsigned long tmp;
>> 
>> 	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
>> 	cmm_barrier();
>> 	tmp = URCU_TLS(urcu_mb_reader).ctr;
>> 	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
>> 	_urcu_mb_read_lock_update(tmp);
>> }
>> 
>> The main difference between the two algorithm is that task-trace within the
>> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
>> incremented or flipped between 0 and 1 by the grace period. This allow RCU
>> readers
>> outermost nesting starting after the beginning of the grace period not to
>> prevent
>> progress of the grace period.
>> 
>> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
>> grace period from ever completing.
>> 
>> Or is this handled in a clever way that I am missing here ?
> 
> There are several mechanisms designed to handle this.  The following
> paragraphs describe these at a high level.
> 
> The trc_wait_for_one_reader() is invoked on each task.  It uses the
> try_invoke_on_locked_down_task(), which, if the task is currently not
> running, keeps it that way and invokes trc_inspect_reader().  If the
> locked-down task is in a read-side critical section, the need_qs field
> is set, which will cause the task's next rcu_read_lock_trace() to report
> the quiescent state.

I suspect you meant "rcu_read_unlock_trace()" here.

> 
> If read-side memory barriers have been enabled, trc_inspect_reader()
> is able to check for a reader being active, and if not, reports the
> quiescent state.  If there is a reader, trc_inspect_reader() reports
> failure, which is another path to the following paragraph.
> 
> If the task could not be locked down due its currently running,
> then trc_wait_for_one_reader() attempts to send an IPI, which results in
> trc_read_check_handler() rechecking for a read-side critical section
> and either reporting the quiescent state immediately or proceding in the
> same way that trc_inspect_reader() does.  The trc_read_check_handler()
> of course checks to make sure that the target task is still running
> before doing anything.  If the attempt to send the IPI fails, then
> the task is rechecked in a later pass.
> 
> So what sequence of events did you find that causes these mechanisms
> to fail?

The explanation you provide takes care of my concerns, so I don't have
any remaining problematic scenario in mind.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 15:47   ` Mathieu Desnoyers
@ 2021-02-25 18:33     ` Paul E. McKenney
  2021-02-25 20:20       ` Mathieu Desnoyers
  2021-02-25 22:23       ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-02-25 18:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

On Thu, Feb 25, 2021 at 10:47:32AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 25, 2021, at 10:36 AM, paulmck paulmck@kernel.org wrote:
> 
> > On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote:
> >> Hi Paul,
> >> 
> >> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(),
> >> and I see this:
> >> 
> >> static inline void rcu_read_lock_trace(void)
> >> {
> >>         struct task_struct *t = current;
> >> 
> >>         WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
> >>         barrier();
> >>         if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
> >>             t->trc_reader_special.b.need_mb)
> >>                 smp_mb(); // Pairs with update-side barriers
> >>         rcu_lock_acquire(&rcu_trace_lock_map);
> >> }
> >> 
> >> static inline void rcu_read_unlock_trace(void)
> >> {
> >>         int nesting;
> >>         struct task_struct *t = current;
> >> 
> >>         rcu_lock_release(&rcu_trace_lock_map);
> >>         nesting = READ_ONCE(t->trc_reader_nesting) - 1;
> >>         barrier(); // Critical section before disabling.
> >>         // Disable IPI-based setting of .need_qs.
> >>         WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
> >>         if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
> >>                 WRITE_ONCE(t->trc_reader_nesting, nesting);
> >>                 return;  // We assume shallow reader nesting.
> >>         }
> >>         rcu_read_unlock_trace_special(t, nesting);
> >> }
> >> 
> >> AFAIU, each thread keeps track of whether it is nested within a RCU read-side
> >> critical
> >> section with a counter, and grace periods iterate over all threads to make sure
> >> they
> >> are not within a read-side critical section before they can complete:
> >> 
> >> # define rcu_tasks_trace_qs(t)                                          \
> >>         do {                                                            \
> >>                 if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
> >>                     !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
> >>                         smp_store_release(&(t)->trc_reader_checked, true); \
> >>                         smp_mb(); /* Readers partitioned by store. */   \
> >>                 }                                                       \
> >>         } while (0)
> >> 
> >> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
> >> state to track whether threads are nested within a critical section:
> >> 
> >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
> >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
> >> 
> >> static inline void _urcu_mb_read_lock_update(unsigned long tmp)
> >> {
> >> 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
> >> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr,
> >> 		_CMM_LOAD_SHARED(urcu_mb_gp.ctr));
> >> 		cmm_smp_mb();
> >> 	} else
> >> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
> >> }
> >> 
> >> static inline void _urcu_mb_read_lock(void)
> >> {
> >> 	unsigned long tmp;
> >> 
> >> 	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
> >> 	cmm_barrier();
> >> 	tmp = URCU_TLS(urcu_mb_reader).ctr;
> >> 	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
> >> 	_urcu_mb_read_lock_update(tmp);
> >> }
> >> 
> >> The main difference between the two algorithm is that task-trace within the
> >> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
> >> incremented or flipped between 0 and 1 by the grace period. This allow RCU
> >> readers
> >> outermost nesting starting after the beginning of the grace period not to
> >> prevent
> >> progress of the grace period.
> >> 
> >> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
> >> grace period from ever completing.
> >> 
> >> Or is this handled in a clever way that I am missing here ?
> > 
> > There are several mechanisms designed to handle this.  The following
> > paragraphs describe these at a high level.
> > 
> > The trc_wait_for_one_reader() is invoked on each task.  It uses the
> > try_invoke_on_locked_down_task(), which, if the task is currently not
> > running, keeps it that way and invokes trc_inspect_reader().  If the
> > locked-down task is in a read-side critical section, the need_qs field
> > is set, which will cause the task's next rcu_read_lock_trace() to report
> > the quiescent state.
> 
> I suspect you meant "rcu_read_unlock_trace()" here.

You are quite correct, apologies for my early morning confusion!

> > If read-side memory barriers have been enabled, trc_inspect_reader()
> > is able to check for a reader being active, and if not, reports the
> > quiescent state.  If there is a reader, trc_inspect_reader() reports
> > failure, which is another path to the following paragraph.
> > 
> > If the task could not be locked down due its currently running,
> > then trc_wait_for_one_reader() attempts to send an IPI, which results in
> > trc_read_check_handler() rechecking for a read-side critical section
> > and either reporting the quiescent state immediately or proceding in the
> > same way that trc_inspect_reader() does.  The trc_read_check_handler()
> > of course checks to make sure that the target task is still running
> > before doing anything.  If the attempt to send the IPI fails, then
> > the task is rechecked in a later pass.
> > 
> > So what sequence of events did you find that causes these mechanisms
> > to fail?
> 
> The explanation you provide takes care of my concerns, so I don't have
> any remaining problematic scenario in mind.

Would the block comment added by the below patch have helped?

One question for Peter...  Does each and every context switch imply a
full barrier?

I am pretty sure that it does, but figured that this was a good time
to double-check, given that RCU Tasks Trace assumes this.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit 581f79546b6be406a9c7280b2d3511b60821efe0
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Feb 25 10:26:00 2021 -0800

    rcu-tasks: Add block comment laying out RCU Tasks Trace design
    
    This commit adds a block comment that gives a high-level overview of
    how RCU tasks trace grace periods progress.  It also adds a note about
    how exiting tasks are handles, plus it gives an overview of the memory
    ordering.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 17c8ebe..f818357 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
 // flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
 // readers can operate from idle, offline, and exception entry/exit in no
 // way allows rcu_preempt and rcu_sched readers to also do so.
+//
+// The implementation uses rcu_tasks_wait_gp(), which relies on function
+// pointers in the rcu_tasks structure.  The rcu_spawn_tasks_trace_kthread()
+// function sets these function pointers up so that rcu_tasks_wait_gp()
+// invokes these functions in this order:
+//
+// rcu_tasks_trace_pregp_step():
+//	Initialize the count of readers and block CPU-hotplug operations.
+// rcu_tasks_trace_pertask(), invoked on every non-idle task:
+//	Initialize per-task state and attempt to identify an immediate
+//	quiescent state for that task, or, failing that, attempt to set
+//	that task's .need_qs flag so that that task's next outermost
+//	rcu_read_unlock_trace() will report the quiescent state (in which
+//	case the count of readers is incremented).  If both attempts fail,
+//	the task is added to a "holdout" list.
+// rcu_tasks_trace_postscan():
+//	Initialize state and attempt to identify an immediate quiescent
+//	state as above (but only for idle tasks), unblock CPU-hotplug
+//	operations, and wait for an RCU grace period to avoid races with
+//	tasks that are in the process of exiting.
+// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty:
+//	Scans the holdout list, attempting to identify a quiescent state
+//	for each task on the list.  If there is a quiescent state, the
+//	corresponding task is removed from the holdout list.
+// rcu_tasks_trace_postgp():
+//	Wait for the count of readers do drop to zero, reporting any stalls.
+//	Also execute full memory barriers to maintain ordering with code
+//	executing after the grace period.
+//
+// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks.
+//
+// Pre-grace-period update-side code is ordered before the grace
+// period via the ->cbs_lock and barriers in rcu_tasks_kthread().
+// Pre-grace-period read-side code is ordered before the grace period by
+// atomic_dec_and_test() of the count of readers (for IPIed readers) and by
+// scheduler context-switch ordering (for locked-down non-running readers).
 
 // The lockdep state must be outside of #ifdef to be useful.
 #ifdef CONFIG_DEBUG_LOCK_ALLOC

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 18:33     ` Paul E. McKenney
@ 2021-02-25 20:20       ` Mathieu Desnoyers
  2021-02-25 20:55         ` Paul E. McKenney
  2021-02-25 22:23       ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-02-25 20:20 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

----- On Feb 25, 2021, at 1:33 PM, paulmck paulmck@kernel.org wrote:
[...]
> commit 581f79546b6be406a9c7280b2d3511b60821efe0
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Feb 25 10:26:00 2021 -0800
> 
>    rcu-tasks: Add block comment laying out RCU Tasks Trace design
>    
>    This commit adds a block comment that gives a high-level overview of
>    how RCU tasks trace grace periods progress.  It also adds a note about
>    how exiting tasks are handles, plus it gives an overview of the memory

handles -> handled

>    ordering.
>    
>    Reported-by: Peter Zijlstra <peterz@infradead.org>
>    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 17c8ebe..f818357 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
> // flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
> // readers can operate from idle, offline, and exception entry/exit in no
> // way allows rcu_preempt and rcu_sched readers to also do so.
> +//
> +// The implementation uses rcu_tasks_wait_gp(), which relies on function
> +// pointers in the rcu_tasks structure.  The rcu_spawn_tasks_trace_kthread()
> +// function sets these function pointers up so that rcu_tasks_wait_gp()
> +// invokes these functions in this order:
> +//
> +// rcu_tasks_trace_pregp_step():
> +//	Initialize the count of readers and block CPU-hotplug operations.
> +// rcu_tasks_trace_pertask(), invoked on every non-idle task:
> +//	Initialize per-task state and attempt to identify an immediate
> +//	quiescent state for that task, or, failing that, attempt to set
> +//	that task's .need_qs flag so that that task's next outermost
> +//	rcu_read_unlock_trace() will report the quiescent state (in which
> +//	case the count of readers is incremented).  If both attempts fail,
> +//	the task is added to a "holdout" list.
> +// rcu_tasks_trace_postscan():
> +//	Initialize state and attempt to identify an immediate quiescent
> +//	state as above (but only for idle tasks), unblock CPU-hotplug
> +//	operations, and wait for an RCU grace period to avoid races with
> +//	tasks that are in the process of exiting.
> +// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty:
> +//	Scans the holdout list, attempting to identify a quiescent state
> +//	for each task on the list.  If there is a quiescent state, the
> +//	corresponding task is removed from the holdout list.
> +// rcu_tasks_trace_postgp():
> +//	Wait for the count of readers do drop to zero, reporting any stalls.
> +//	Also execute full memory barriers to maintain ordering with code
> +//	executing after the grace period.
> +//
> +// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks.
> +//
> +// Pre-grace-period update-side code is ordered before the grace
> +// period via the ->cbs_lock and barriers in rcu_tasks_kthread().
> +// Pre-grace-period read-side code is ordered before the grace period by
> +// atomic_dec_and_test() of the count of readers (for IPIed readers) and by
> +// scheduler context-switch ordering (for locked-down non-running readers).

The rest looks good, thanks!

Mathieu

> 
> // The lockdep state must be outside of #ifdef to be useful.
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 20:20       ` Mathieu Desnoyers
@ 2021-02-25 20:55         ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-02-25 20:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

On Thu, Feb 25, 2021 at 03:20:34PM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 25, 2021, at 1:33 PM, paulmck paulmck@kernel.org wrote:
> [...]
> > commit 581f79546b6be406a9c7280b2d3511b60821efe0
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Feb 25 10:26:00 2021 -0800
> > 
> >    rcu-tasks: Add block comment laying out RCU Tasks Trace design
> >    
> >    This commit adds a block comment that gives a high-level overview of
> >    how RCU tasks trace grace periods progress.  It also adds a note about
> >    how exiting tasks are handles, plus it gives an overview of the memory
> 
> handles -> handled

Good eyes, fixed!

> >    ordering.
> >    
> >    Reported-by: Peter Zijlstra <peterz@infradead.org>
> >    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index 17c8ebe..f818357 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread);
> > // flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
> > // readers can operate from idle, offline, and exception entry/exit in no
> > // way allows rcu_preempt and rcu_sched readers to also do so.
> > +//
> > +// The implementation uses rcu_tasks_wait_gp(), which relies on function
> > +// pointers in the rcu_tasks structure.  The rcu_spawn_tasks_trace_kthread()
> > +// function sets these function pointers up so that rcu_tasks_wait_gp()
> > +// invokes these functions in this order:
> > +//
> > +// rcu_tasks_trace_pregp_step():
> > +//	Initialize the count of readers and block CPU-hotplug operations.
> > +// rcu_tasks_trace_pertask(), invoked on every non-idle task:
> > +//	Initialize per-task state and attempt to identify an immediate
> > +//	quiescent state for that task, or, failing that, attempt to set
> > +//	that task's .need_qs flag so that that task's next outermost
> > +//	rcu_read_unlock_trace() will report the quiescent state (in which
> > +//	case the count of readers is incremented).  If both attempts fail,
> > +//	the task is added to a "holdout" list.
> > +// rcu_tasks_trace_postscan():
> > +//	Initialize state and attempt to identify an immediate quiescent
> > +//	state as above (but only for idle tasks), unblock CPU-hotplug
> > +//	operations, and wait for an RCU grace period to avoid races with
> > +//	tasks that are in the process of exiting.
> > +// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty:
> > +//	Scans the holdout list, attempting to identify a quiescent state
> > +//	for each task on the list.  If there is a quiescent state, the
> > +//	corresponding task is removed from the holdout list.
> > +// rcu_tasks_trace_postgp():
> > +//	Wait for the count of readers do drop to zero, reporting any stalls.
> > +//	Also execute full memory barriers to maintain ordering with code
> > +//	executing after the grace period.
> > +//
> > +// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks.
> > +//
> > +// Pre-grace-period update-side code is ordered before the grace
> > +// period via the ->cbs_lock and barriers in rcu_tasks_kthread().
> > +// Pre-grace-period read-side code is ordered before the grace period by
> > +// atomic_dec_and_test() of the count of readers (for IPIed readers) and by
> > +// scheduler context-switch ordering (for locked-down non-running readers).
> 
> The rest looks good, thanks!

Thank you for looking it over!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 18:33     ` Paul E. McKenney
  2021-02-25 20:20       ` Mathieu Desnoyers
@ 2021-02-25 22:23       ` Peter Zijlstra
  2021-02-25 23:05         ` Paul E. McKenney
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2021-02-25 22:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, linux-kernel, rcu, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

On Thu, Feb 25, 2021 at 10:33:21AM -0800, Paul E. McKenney wrote:
> One question for Peter...  Does each and every context switch imply a
> full barrier?

Yes, also see the smp_mb__after_spinlock() in __schedule() :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: tasks-trace RCU: question about grace period forward progress
  2021-02-25 22:23       ` Peter Zijlstra
@ 2021-02-25 23:05         ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-02-25 23:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, linux-kernel, rcu, Josh Triplett, rostedt,
	Lai Jiangshan, Joel Fernandes, Google

On Thu, Feb 25, 2021 at 11:23:18PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 25, 2021 at 10:33:21AM -0800, Paul E. McKenney wrote:
> > One question for Peter...  Does each and every context switch imply a
> > full barrier?
> 
> Yes, also see the smp_mb__after_spinlock() in __schedule() :-)

Whew!!!  ;-)

Yeah, I could make RCU Tasks Trace deal with lack of a full barrier in
that case, but I would rather not...  I could imagine optimizing so
that the full barrier happened only when tasks migrated, but I could
also imagine a world of hurt stemming from such an optimization!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-25 23:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 14:22 tasks-trace RCU: question about grace period forward progress Mathieu Desnoyers
2021-02-25 15:36 ` Paul E. McKenney
2021-02-25 15:47   ` Mathieu Desnoyers
2021-02-25 18:33     ` Paul E. McKenney
2021-02-25 20:20       ` Mathieu Desnoyers
2021-02-25 20:55         ` Paul E. McKenney
2021-02-25 22:23       ` Peter Zijlstra
2021-02-25 23:05         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).