linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Use case for TASKS_RCU
@ 2017-05-15 18:23 Paul E. McKenney
  2017-05-15 18:48 ` Steven Rostedt
  2017-05-16  6:22 ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-15 18:23 UTC (permalink / raw)
  To: mingo, rostedt; +Cc: linux-kernel

Hello!

The question of the use case for TASKS_RCU came up, and here is my
understanding.  Steve will not be shy about correcting any misconceptions
I might have.  ;-)

The use case is to support freeing of trampolines used in tracing/probing
in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
executing in the trampoline in question has left it, taking into account
that the trampoline's code might be interrupted and preempted.  However,
the code in the trampolines is guaranteed never to context switch.

Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
It is therefore tempting to think in terms of disabling preemption across
the trampolines, but there is apparently not enough room to accommodate
the needed preempt_disable() and preempt_enable() in the code invoking
the trampoline, and putting the preempt_disable() and preempt_enable()
in the trampoline itself fails because of the possibility of preemption
just before the preempt_disable() and just after the preempt_enable().
Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().

Another possibility would be to place the trampolines in a known region
of memory, and check for the task's PC being in that region.  This fails
because trampolines can be interrupted, and I vaguely recall something
about them calling function as well.  Stack tracing could be added,
but stack tracing is not as reliable as it would need to be.

The solution chosen relies on the fact that code in trampolines
(and code invoked from trampolines) is not permitted to do voluntary
context switches.  Thus, if a trampoline is removed, and a given task
later does a voluntary context switch (or has been seen in usermode),
that task will never again reference that trampoline.  Once all tasks
are accounted for, the trampoline may safely be removed.

TASKS_RCU implements a flavor of RCU that does exactly this.  It has
only a single use at the moment, but avoiding memory leaks on
production machines being instrumented seems to me to be quite valuable.

So, Steve, please correct any misconceptions!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-15 18:23 Use case for TASKS_RCU Paul E. McKenney
@ 2017-05-15 18:48 ` Steven Rostedt
  2017-05-15 20:12   ` Paul E. McKenney
  2017-05-16  6:22 ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-15 18:48 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: mingo, linux-kernel

On Mon, 15 May 2017 11:23:54 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Hello!
> 
> The question of the use case for TASKS_RCU came up, and here is my
> understanding.  Steve will not be shy about correcting any misconceptions
> I might have.  ;-)
> 
> The use case is to support freeing of trampolines used in tracing/probing
> in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> executing in the trampoline in question has left it, taking into account
> that the trampoline's code might be interrupted and preempted.  However,
> the code in the trampolines is guaranteed never to context switch.

nit, "never to voluntarily context switch" as it can still be
preempted. It should never call schedule nor a mutex. And really it
shouldn't even call any spinlocks. Although, trace_stack does, but it
does so after checking if in_nmi(), which it bails if that is true.

> 
> Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> It is therefore tempting to think in terms of disabling preemption across
> the trampolines, but there is apparently not enough room to accommodate
> the needed preempt_disable() and preempt_enable() in the code invoking
> the trampoline, and putting the preempt_disable() and preempt_enable()
> in the trampoline itself fails because of the possibility of preemption
> just before the preempt_disable() and just after the preempt_enable().
> Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().

Correct, as the jump to the trampoline may be preempted. And preemption
happens just before the first instruction on the trampoline is being
executed.


> 
> Another possibility would be to place the trampolines in a known region
> of memory, and check for the task's PC being in that region.  This fails
> because trampolines can be interrupted, and I vaguely recall something
> about them calling function as well.  Stack tracing could be added,
> but stack tracing is not as reliable as it would need to be.

Correct.

> 
> The solution chosen relies on the fact that code in trampolines
> (and code invoked from trampolines) is not permitted to do voluntary
> context switches.  Thus, if a trampoline is removed, and a given task
> later does a voluntary context switch (or has been seen in usermode),
> that task will never again reference that trampoline.  Once all tasks
> are accounted for, the trampoline may safely be removed.

Correct.

> 
> TASKS_RCU implements a flavor of RCU that does exactly this.  It has
> only a single use at the moment, but avoiding memory leaks on
> production machines being instrumented seems to me to be quite valuable.

Optimized kprobes can also benefit from this, as it currently is
disabled on CONFIG_PREEMPT due to exactly the same issue. I'll poke
Masami about this again. I should be seeing him in a couple of weeks at
the Open Source Summit in Tokyo.


> 
> So, Steve, please correct any misconceptions!

Nope, all looks good.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-15 18:48 ` Steven Rostedt
@ 2017-05-15 20:12   ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-15 20:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, linux-kernel

On Mon, May 15, 2017 at 02:48:10PM -0400, Steven Rostedt wrote:
> On Mon, 15 May 2017 11:23:54 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Hello!
> > 
> > The question of the use case for TASKS_RCU came up, and here is my
> > understanding.  Steve will not be shy about correcting any misconceptions
> > I might have.  ;-)
> > 
> > The use case is to support freeing of trampolines used in tracing/probing
> > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > executing in the trampoline in question has left it, taking into account
> > that the trampoline's code might be interrupted and preempted.  However,
> > the code in the trampolines is guaranteed never to context switch.
> 
> nit, "never to voluntarily context switch" as it can still be
> preempted. It should never call schedule nor a mutex. And really it
> shouldn't even call any spinlocks. Although, trace_stack does, but it
> does so after checking if in_nmi(), which it bails if that is true.

Good catch, thank you!  And thank you for the checking on the rest.

Ingo, thoughts?

							Thanx, Paul

> > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > It is therefore tempting to think in terms of disabling preemption across
> > the trampolines, but there is apparently not enough room to accommodate
> > the needed preempt_disable() and preempt_enable() in the code invoking
> > the trampoline, and putting the preempt_disable() and preempt_enable()
> > in the trampoline itself fails because of the possibility of preemption
> > just before the preempt_disable() and just after the preempt_enable().
> > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().
> 
> Correct, as the jump to the trampoline may be preempted. And preemption
> happens just before the first instruction on the trampoline is being
> executed.
> 
> 
> > 
> > Another possibility would be to place the trampolines in a known region
> > of memory, and check for the task's PC being in that region.  This fails
> > because trampolines can be interrupted, and I vaguely recall something
> > about them calling function as well.  Stack tracing could be added,
> > but stack tracing is not as reliable as it would need to be.
> 
> Correct.
> 
> > 
> > The solution chosen relies on the fact that code in trampolines
> > (and code invoked from trampolines) is not permitted to do voluntary
> > context switches.  Thus, if a trampoline is removed, and a given task
> > later does a voluntary context switch (or has been seen in usermode),
> > that task will never again reference that trampoline.  Once all tasks
> > are accounted for, the trampoline may safely be removed.
> 
> Correct.
> 
> > 
> > TASKS_RCU implements a flavor of RCU that does exactly this.  It has
> > only a single use at the moment, but avoiding memory leaks on
> > production machines being instrumented seems to me to be quite valuable.
> 
> Optimized kprobes can also benefit from this, as it currently is
> disabled on CONFIG_PREEMPT due to exactly the same issue. I'll poke
> Masami about this again. I should be seeing him in a couple of weeks at
> the Open Source Summit in Tokyo.
> 
> 
> > 
> > So, Steve, please correct any misconceptions!
> 
> Nope, all looks good.
> 
> -- Steve
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-15 18:23 Use case for TASKS_RCU Paul E. McKenney
  2017-05-15 18:48 ` Steven Rostedt
@ 2017-05-16  6:22 ` Ingo Molnar
  2017-05-16 12:23   ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2017-05-16  6:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rostedt, linux-kernel


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> Hello!
> 
> The question of the use case for TASKS_RCU came up, and here is my
> understanding.  Steve will not be shy about correcting any misconceptions
> I might have.  ;-)
> 
> The use case is to support freeing of trampolines used in tracing/probing
> in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> executing in the trampoline in question has left it, taking into account
> that the trampoline's code might be interrupted and preempted.  However,
> the code in the trampolines is guaranteed never to context switch.
> 
> Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> It is therefore tempting to think in terms of disabling preemption across
> the trampolines, but there is apparently not enough room to accommodate
> the needed preempt_disable() and preempt_enable() in the code invoking
> the trampoline, and putting the preempt_disable() and preempt_enable()
> in the trampoline itself fails because of the possibility of preemption
> just before the preempt_disable() and just after the preempt_enable().
> Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().

So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
the moment, so it's hard for me to review its users. What am I missing?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-16  6:22 ` Ingo Molnar
@ 2017-05-16 12:23   ` Paul E. McKenney
  2017-05-16 13:07     ` Steven Rostedt
  2017-05-19  6:23     ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-16 12:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, linux-kernel

On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Hello!
> > 
> > The question of the use case for TASKS_RCU came up, and here is my
> > understanding.  Steve will not be shy about correcting any misconceptions
> > I might have.  ;-)
> > 
> > The use case is to support freeing of trampolines used in tracing/probing
> > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > executing in the trampoline in question has left it, taking into account
> > that the trampoline's code might be interrupted and preempted.  However,
> > the code in the trampolines is guaranteed never to context switch.
> > 
> > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > It is therefore tempting to think in terms of disabling preemption across
> > the trampolines, but there is apparently not enough room to accommodate
> > the needed preempt_disable() and preempt_enable() in the code invoking
> > the trampoline, and putting the preempt_disable() and preempt_enable()
> > in the trampoline itself fails because of the possibility of preemption
> > just before the preempt_disable() and just after the preempt_enable().
> > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().
> 
> So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
> the moment, so it's hard for me to review its users. What am I missing?

Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.

Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
So yes, currently one user.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-16 12:23   ` Paul E. McKenney
@ 2017-05-16 13:07     ` Steven Rostedt
  2017-05-24  9:37       ` Masami Hiramatsu
  2017-05-19  6:23     ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-16 13:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Ingo Molnar, linux-kernel, Masami Hiramatsu

On Tue, 16 May 2017 05:23:54 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > Hello!
> > > 
> > > The question of the use case for TASKS_RCU came up, and here is my
> > > understanding.  Steve will not be shy about correcting any misconceptions
> > > I might have.  ;-)
> > > 
> > > The use case is to support freeing of trampolines used in tracing/probing
> > > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > > executing in the trampoline in question has left it, taking into account
> > > that the trampoline's code might be interrupted and preempted.  However,
> > > the code in the trampolines is guaranteed never to context switch.
> > > 
> > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > > It is therefore tempting to think in terms of disabling preemption across
> > > the trampolines, but there is apparently not enough room to accommodate
> > > the needed preempt_disable() and preempt_enable() in the code invoking
> > > the trampoline, and putting the preempt_disable() and preempt_enable()
> > > in the trampoline itself fails because of the possibility of preemption
> > > just before the preempt_disable() and just after the preempt_enable().
> > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().  
> > 
> > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
> > the moment, so it's hard for me to review its users. What am I missing?  
> 
> Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.

Actually, things simply were not implemented. This is why optimized
kprobes is dependent on !CONFIG_PREEMPT. In fact, we can now optimize
kprobes on CONFIG_PREEMPT with this utility. Right Masami?

With ftrace, perf and other "dynamic" users (where the ftrace_ops was
created via a kmalloc), would not get the benefit of being called
directly. They all needed to have their mcount/fentry's call a static
trampoline that disabled preemption before calling the callback. This
static trampoline is shared by all, so even if perf was the only
callback for the function, it had to call this trampoline that iterated
through all registered ftrace_ops to see which one had a callback for
the given function.

With this utility, perf not only gets the benefit of not having to use
that static loop trampoline, it can even have its own trampoline
created that doesn't even need to do the check if perf wants this
function or not, as the only way the trampoline is called, is if perf
wanted it.

> 
> Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
> So yes, currently one user.
> 

And the kpatch folks want to use it too.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-16 12:23   ` Paul E. McKenney
  2017-05-16 13:07     ` Steven Rostedt
@ 2017-05-19  6:23     ` Ingo Molnar
  2017-05-19 13:35       ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2017-05-19  6:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rostedt, linux-kernel, Peter Zijlstra, Thomas Gleixner


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Hello!
> > > 
> > > The question of the use case for TASKS_RCU came up, and here is my
> > > understanding.  Steve will not be shy about correcting any misconceptions
> > > I might have.  ;-)
> > > 
> > > The use case is to support freeing of trampolines used in tracing/probing
> > > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > > executing in the trampoline in question has left it, taking into account
> > > that the trampoline's code might be interrupted and preempted.  However,
> > > the code in the trampolines is guaranteed never to context switch.
> > > 
> > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > > It is therefore tempting to think in terms of disabling preemption across
> > > the trampolines, but there is apparently not enough room to accommodate
> > > the needed preempt_disable() and preempt_enable() in the code invoking
> > > the trampoline, and putting the preempt_disable() and preempt_enable()
> > > in the trampoline itself fails because of the possibility of preemption
> > > just before the preempt_disable() and just after the preempt_enable().
> > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().
> > 
> > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
> > the moment, so it's hard for me to review its users. What am I missing?
> 
> Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.
> 
> Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
> So yes, currently one user.

So why not schedule a worklet on every CPU to drive the trampoline freeing? To 
guarantee that nothing was preempted it could run at SCHED_IDLE and could observe 
nr_running from the worklet and use a short timeout loop. Batching and hysteresis 
would ensure that this is only running rarely in practice.

It doesn't have to be fast or particularly elegant, but it could use existing 
kernel facilites just fine: it's a corner case cost and quirk of our live kernel 
text modifying trampoline code and our current CONFIG_PREEMPT=y model.

I.e. don't make it an RCU facility that complicates not just the RCU code but has 
various costs in generic code as well:

kernel/exit.c:  TASKS_RCU(int tasks_rcu_i);
kernel/exit.c:  TASKS_RCU(preempt_disable());
kernel/exit.c:  TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
kernel/exit.c:  TASKS_RCU(preempt_enable());
kernel/exit.c:  TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));

I.e. I question that this should be a generic RCU facility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-19  6:23     ` Ingo Molnar
@ 2017-05-19 13:35       ` Paul E. McKenney
  2017-05-19 14:04         ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-19 13:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Fri, May 19, 2017 at 08:23:31AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> > > 
> > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > Hello!
> > > > 
> > > > The question of the use case for TASKS_RCU came up, and here is my
> > > > understanding.  Steve will not be shy about correcting any misconceptions
> > > > I might have.  ;-)
> > > > 
> > > > The use case is to support freeing of trampolines used in tracing/probing
> > > > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > > > executing in the trampoline in question has left it, taking into account
> > > > that the trampoline's code might be interrupted and preempted.  However,
> > > > the code in the trampolines is guaranteed never to context switch.
> > > > 
> > > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > > > It is therefore tempting to think in terms of disabling preemption across
> > > > the trampolines, but there is apparently not enough room to accommodate
> > > > the needed preempt_disable() and preempt_enable() in the code invoking
> > > > the trampoline, and putting the preempt_disable() and preempt_enable()
> > > > in the trampoline itself fails because of the possibility of preemption
> > > > just before the preempt_disable() and just after the preempt_enable().
> > > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().
> > > 
> > > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
> > > the moment, so it's hard for me to review its users. What am I missing?
> > 
> > Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.
> > 
> > Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
> > So yes, currently one user.
> 
> So why not schedule a worklet on every CPU to drive the trampoline freeing? To 
> guarantee that nothing was preempted it could run at SCHED_IDLE and could observe 
> nr_running from the worklet and use a short timeout loop. Batching and hysteresis 
> would ensure that this is only running rarely in practice.
> 
> It doesn't have to be fast or particularly elegant, but it could use existing 
> kernel facilites just fine: it's a corner case cost and quirk of our live kernel 
> text modifying trampoline code and our current CONFIG_PREEMPT=y model.
> 
> I.e. don't make it an RCU facility that complicates not just the RCU code but has 
> various costs in generic code as well:
> 
> kernel/exit.c:  TASKS_RCU(int tasks_rcu_i);
> kernel/exit.c:  TASKS_RCU(preempt_disable());
> kernel/exit.c:  TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
> kernel/exit.c:  TASKS_RCU(preempt_enable());
> kernel/exit.c:  TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
> 
> I.e. I question that this should be a generic RCU facility.

Simpler would be better!

However, is it really guaranteed that one SCHED_IDLE thread cannot
preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
might preempt some other SCHED_IDLE thread in the middle of a trampoline.
I am not seeing anything that prevents such preemption, but it is rather
early local time, so I could easily be missing something.

However, if SCHED_IDLE threads cannot preempt other threads, even other
SCHED_IDLE threads, then your approach sounds quite promising to me.

Steve, Peter, thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-19 13:35       ` Paul E. McKenney
@ 2017-05-19 14:04         ` Steven Rostedt
  2017-05-19 14:23           ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-19 14:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Fri, 19 May 2017 06:35:50 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Simpler would be better!
> 
> However, is it really guaranteed that one SCHED_IDLE thread cannot
> preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
> might preempt some other SCHED_IDLE thread in the middle of a trampoline.
> I am not seeing anything that prevents such preemption, but it is rather
> early local time, so I could easily be missing something.
> 
> However, if SCHED_IDLE threads cannot preempt other threads, even other
> SCHED_IDLE threads, then your approach sounds quite promising to me.
> 
> Steve, Peter, thoughts?

SCHED_IDLE is the swapper task. There's one on each CPU, and they don't
migrate. And they only get called when there's no other task running.

My worry is that if you have a busy CPU, then this will never finish,
as the idle task can be preempted forever. That would cause this to
never return:

  perf record -e ftrace:function sleep 1

That may never exit when it finishes, because the call to
unregister_ftrace_function() wont return till schedule_rcu_tasks()
returns. And worse yet, that is TASK_UNINTERRUPTIBLE state, where not
even a ctrl-C will stop it. It could possibly also trigger a hung task
warning.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-19 14:04         ` Steven Rostedt
@ 2017-05-19 14:23           ` Steven Rostedt
  2017-05-19 19:06             ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-19 14:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Fri, 19 May 2017 10:04:21 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 19 May 2017 06:35:50 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Simpler would be better!
> > 
> > However, is it really guaranteed that one SCHED_IDLE thread cannot
> > preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
> > might preempt some other SCHED_IDLE thread in the middle of a trampoline.
> > I am not seeing anything that prevents such preemption, but it is rather
> > early local time, so I could easily be missing something.
> > 
> > However, if SCHED_IDLE threads cannot preempt other threads, even other
> > SCHED_IDLE threads, then your approach sounds quite promising to me.
> > 
> > Steve, Peter, thoughts?  
> 
> SCHED_IDLE is the swapper task. There's one on each CPU, and they don't
> migrate. And they only get called when there's no other task running.

Peter just "schooled" me on IRC. I stand corrected (and he may respond
to this email too). I guess any task can become SCHED_IDLE.

But that just makes this an even less likely option for
synchronize_rcu_tasks().

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-19 14:23           ` Steven Rostedt
@ 2017-05-19 19:06             ` Paul E. McKenney
  2017-05-23  0:00               ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-19 19:06 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:
> On Fri, 19 May 2017 10:04:21 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Fri, 19 May 2017 06:35:50 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Simpler would be better!
> > > 
> > > However, is it really guaranteed that one SCHED_IDLE thread cannot
> > > preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
> > > might preempt some other SCHED_IDLE thread in the middle of a trampoline.
> > > I am not seeing anything that prevents such preemption, but it is rather
> > > early local time, so I could easily be missing something.
> > > 
> > > However, if SCHED_IDLE threads cannot preempt other threads, even other
> > > SCHED_IDLE threads, then your approach sounds quite promising to me.
> > > 
> > > Steve, Peter, thoughts?  
> > 
> > SCHED_IDLE is the swapper task. There's one on each CPU, and they don't
> > migrate. And they only get called when there's no other task running.
> 
> Peter just "schooled" me on IRC. I stand corrected (and he may respond
> to this email too). I guess any task can become SCHED_IDLE.
> 
> But that just makes this an even less likely option for
> synchronize_rcu_tasks().

Hmmm...  The goal is to make sure that any task that was preempted or
running at a given point in time passes through a voluntary context switch
(or userspace execution, or, ...).

What is the simplest way to get this job done?  To Ingo's point, I bet
that there is a simpler way than the current TASKS_RCU implementation.

Ingo, if I make it fit into 100 lines of code, would you be OK with it?
I probably need a one-line hook at task-creation time and another
at task-exit time, if that makes a difference.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-19 19:06             ` Paul E. McKenney
@ 2017-05-23  0:00               ` Paul E. McKenney
  2017-05-23  5:19                 ` Steven Rostedt
  2017-05-23 19:39                 ` Steven Rostedt
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-23  0:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote:
> On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:
> > On Fri, 19 May 2017 10:04:21 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Fri, 19 May 2017 06:35:50 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > Simpler would be better!
> > > > 
> > > > However, is it really guaranteed that one SCHED_IDLE thread cannot
> > > > preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
> > > > might preempt some other SCHED_IDLE thread in the middle of a trampoline.
> > > > I am not seeing anything that prevents such preemption, but it is rather
> > > > early local time, so I could easily be missing something.
> > > > 
> > > > However, if SCHED_IDLE threads cannot preempt other threads, even other
> > > > SCHED_IDLE threads, then your approach sounds quite promising to me.
> > > > 
> > > > Steve, Peter, thoughts?  
> > > 
> > > SCHED_IDLE is the swapper task. There's one on each CPU, and they don't
> > > migrate. And they only get called when there's no other task running.
> > 
> > Peter just "schooled" me on IRC. I stand corrected (and he may respond
> > to this email too). I guess any task can become SCHED_IDLE.
> > 
> > But that just makes this an even less likely option for
> > synchronize_rcu_tasks().
> 
> Hmmm...  The goal is to make sure that any task that was preempted or
> running at a given point in time passes through a voluntary context switch
> (or userspace execution, or, ...).
> 
> What is the simplest way to get this job done?  To Ingo's point, I bet
> that there is a simpler way than the current TASKS_RCU implementation.
> 
> Ingo, if I make it fit into 100 lines of code, would you be OK with it?
> I probably need a one-line hook at task-creation time and another
> at task-exit time, if that makes a difference.

And please see below for such a patch, which does add (just barely)
fewer than 100 lines net.

Unfortunately, it does not work, as I should have known ahead of time
from the dyntick-idle experience.  Not all context switches go through
context_switch().  :-/

I believe this is fixable, more or less like dyntick-idle's half-interrupts
were fixable, but it will likely be a few days.  Not clear whether the
result will be simpler than current TASKS_RCU, but there is only one
way to find out.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

 include/linux/init_task.h |    7 +++++
 include/linux/rcupdate.h  |   12 ++++++++++
 include/linux/sched.h     |    5 ++++
 kernel/fork.c             |    4 +++
 kernel/rcu/Kconfig        |    8 +++---
 kernel/rcu/Kconfig.debug  |    2 -
 kernel/rcu/tree.c         |   15 +++++++++---
 kernel/rcu/tree_plugin.h  |   55 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c       |    3 ++
 9 files changed, 101 insertions(+), 10 deletions(-)

commit 25cb9b6882542f3db0242f6556b94e139c4b29d5
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon May 22 08:55:01 2017 -0700

    rcu: Provide simplified RCU-tasks implementation
    
    This commit provides a simplified RCU-tasks implementation in terms of
    SRCU for CONFIG_PREEMPT=y kernels and in terms of RCU-sched otherwise.
    The RCU-sched approach simply maps synchronize_rcu_tasks() directly onto
    synchronize_sched().
    
    The SRCU approach provides a tasks_srcu srcu_struct structure that
    tracks usermode execution and voluntary context switches.  When each
    task is created, it invokes tasks_rcu_qs_exit() and when each task exits
    it invokes tasks_rcu_qs_enter().  On context switch, tasks_rcu_qs_exit()
    is invoked for the incoming task, but only if that task is resuming from a
    voluntary context switch.  If the current context switch is a preemption,
    tasks_rcu_qs_enter() is invoked for the outgoing tasks, and either way
    the preempt-or-not state is recorded for the task's next resumption.
    A "context switch" that keeps the same task invokes tasks_rcu_qs().
    
    If the scheduling-clock interrupt sees a task in usermode, it invokes
    tasks_rcu_qs().  It is normally unsafe to do SRCU read-side operations
    from interrupt, but in this case, usermode was interrupted, so there is
    no chance of having interrupted another SRCU read-side operation.
    
    Given all this, synchronize_srcu(&tasks_srcu) waits for all tasks to
    be seen voluntarily blocked or in user space.  This means that
    synchronize_rcu_tasks() is a one-line function.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index bd11d33d48cc..631f3d3f2dc3 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -140,6 +140,13 @@ extern struct group_info init_groups;
 #else
 #define INIT_TASK_RCU_PREEMPT(tsk)
 #endif
+#ifdef CONFIG_TASKS_RCU
+#define INIT_TASK_RCU_TASKS(tsk)					\
+	.rcu_tasks_idx = 0,						\
+	.rcu_tasks_preempt = false,
+#else /* #ifdef CONFIG_TASKS_RCU */
+#define INIT_TASK_RCU_TASKS(tsk)
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 extern struct cred init_cred;
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5088a44ce99c..1b2f104a467f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -156,6 +156,18 @@ static inline void rcu_init_nohz(void) { }
 		rcu_irq_exit_irqson(); \
 	} while (0)
 
+#ifdef CONFIG_TASKS_RCU
+void tasks_rcu_qs(bool preempt);
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt);
+void tasks_rcu_qs_exit(struct task_struct *t);
+void synchronize_rcu_tasks(void);
+#else /* #ifdef CONFIG_TASKS_RCU */
+static inline void tasks_rcu_qs(bool preempt) { }
+static inline void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) { }
+static inline void tasks_rcu_qs_exit(struct task_struct *t) { }
+static inline void synchronize_rcu_tasks(void) { synchronize_sched(); }
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
 /**
  * cond_resched_rcu_qs - Report potential quiescent states to RCU
  *
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b7d9aa80349..1e5d27d1e885 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,11 @@ struct task_struct {
 	struct rcu_node			*rcu_blocked_node;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_TASKS_RCU
+	u8				rcu_tasks_idx;
+	u8				rcu_tasks_preempt;
+#endif /* #ifdef CONFIG_TASKS_RCU */
+
 	struct sched_info		sched_info;
 
 	struct list_head		tasks;
diff --git a/kernel/fork.c b/kernel/fork.c
index 1784f49b0cd4..8767d739eed4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1498,6 +1498,10 @@ static inline void rcu_copy_process(struct task_struct *p)
 	p->rcu_blocked_node = NULL;
 	INIT_LIST_HEAD(&p->rcu_node_entry);
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+	p->rcu_tasks_preempt = false;
+	tasks_rcu_qs_exit(p);
+#endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
 /*
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index be90c945063f..3077568930f3 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -69,13 +69,13 @@ config TREE_SRCU
 	  This option selects the full-fledged version of SRCU.
 
 config TASKS_RCU
-	bool
-	default n
+	def_bool PREEMPT_RCU
 	select SRCU
 	help
 	  This option enables a task-based RCU implementation that uses
-	  only voluntary context switch (not preemption!), idle, and
-	  user-mode execution as quiescent states.
+	  only voluntary context switch (not preemption!) and user-mode
+	  execution as quiescent states.  For !PREEMPT_RCU kernels,
+	  synchronize_tasks_rcu() maps to synchronize_sched().
 
 config RCU_STALL_COMMON
 	def_bool ( TREE_RCU || PREEMPT_RCU )
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..b6619ceedafc 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -16,7 +16,6 @@ config RCU_PERF_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs performance
@@ -33,7 +32,6 @@ config RCU_TORTURE_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs torture tests
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8008541b04f2..05e7abecde68 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -166,6 +166,7 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 static void rcu_report_exp_rdp(struct rcu_state *rsp,
 			       struct rcu_data *rdp, bool wake);
 static void sync_sched_exp_online_cleanup(int cpu);
+static void rcu_all_qs_ctxt(bool ctxt);
 
 /* rcuc/rcub kthread realtime priority */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
@@ -467,7 +468,7 @@ void rcu_note_context_switch(bool preempt)
 		rcu_momentary_dyntick_idle();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
 	if (!preempt)
-		rcu_all_qs();
+		rcu_all_qs_ctxt(true);
 out:
 	trace_rcu_utilization(TPS("End context switch"));
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
@@ -480,14 +481,13 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  * dyntick-idle quiescent state visible to other CPUs (but only for those
  * RCU flavors in desperate need of a quiescent state, which will normally
  * be none of them).  Either way, do a lightweight quiescent state for
- * all RCU flavors.
+ * all RCU flavors.  If ctxt is false, also to tasks-RCU quiescent state.
  *
  * The barrier() calls are redundant in the common case when this is
  * called externally, but just in case this is called from within this
  * file.
- *
  */
-void rcu_all_qs(void)
+static void rcu_all_qs_ctxt(bool ctxt)
 {
 	unsigned long flags;
 
@@ -509,9 +509,16 @@ void rcu_all_qs(void)
 	if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)))
 		rcu_sched_qs();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
+	if (!ctxt)
+		tasks_rcu_qs(false);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 	preempt_enable();
 }
+
+void rcu_all_qs(void)
+{
+	rcu_all_qs_ctxt(false);
+}
 EXPORT_SYMBOL_GPL(rcu_all_qs);
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d7325553cf82..21b254dd28b8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -783,6 +783,7 @@ void exit_rcu(void)
 	barrier();
 	t->rcu_read_unlock_special.b.blocked = true;
 	__rcu_read_unlock();
+	tasks_rcu_qs_enter(t, false);
 }
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
@@ -2597,3 +2598,57 @@ static void rcu_bind_gp_kthread(void)
 		return;
 	housekeeping_affine(current);
 }
+
+#ifdef CONFIG_TASKS_RCU
+
+DEFINE_STATIC_SRCU(tasks_srcu);
+
+/* Record a momentary RCU-tasks quiescent state. */
+void tasks_rcu_qs(bool preempt)
+{
+	int idx;
+
+	if (preempt || is_idle_task(current))
+		return;  /* Preempted, not a quiescent state. */
+	/* Acquire new before releasing old to allow tracing SRCU. */
+	preempt_disable();
+	idx = __srcu_read_lock(&tasks_srcu);
+	__srcu_read_unlock(&tasks_srcu, current->rcu_tasks_idx);
+	current->rcu_tasks_idx = idx;
+	preempt_enable();
+}
+
+/* Record entering an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt)
+{
+	t->rcu_tasks_preempt = preempt;
+	if (preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	__srcu_read_unlock(&tasks_srcu, t->rcu_tasks_idx);
+	preempt_enable();
+}
+
+/* Record exiting an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_exit(struct task_struct *t)
+{
+	if (t->rcu_tasks_preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	t->rcu_tasks_idx = __srcu_read_lock(&tasks_srcu);
+	preempt_enable();
+}
+
+/**
+ * synchronize_rcu_tasks - Wait for all tasks to voluntarily leave the kernel
+ *
+ * Waits until each task is either voluntarily blocked, running in
+ * userspace, or has offered to block (as in cond_resched_rcu_qs()).
+ * Idle tasks are ignored.
+ */
+void synchronize_rcu_tasks(void)
+{
+	synchronize_srcu(&tasks_srcu);
+}
+
+#endif /* #ifdef CONFIG_TASKS_RCU */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75353aa53ae3..1ba79734377f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3430,6 +3430,7 @@ static void __sched notrace __schedule(bool preempt)
 	clear_preempt_need_resched();
 
 	if (likely(prev != next)) {
+		tasks_rcu_qs_exit(next);
 		rq->nr_switches++;
 		rq->curr = next;
 		++*switch_count;
@@ -3438,9 +3439,11 @@ static void __sched notrace __schedule(bool preempt)
 
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
+		tasks_rcu_qs_enter(prev, preempt);
 	} else {
 		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unlock_irq(rq, &rf);
+		tasks_rcu_qs(preempt);
 	}
 
 	balance_callback(rq);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23  0:00               ` Paul E. McKenney
@ 2017-05-23  5:19                 ` Steven Rostedt
  2017-05-23 15:33                   ` Paul E. McKenney
  2017-05-23 19:39                 ` Steven Rostedt
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-23  5:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Mon, 22 May 2017 17:00:36 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote:
> > On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:  
> > > On Fri, 19 May 2017 10:04:21 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >   
> > > > On Fri, 19 May 2017 06:35:50 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > >   
> > > > > Simpler would be better!
> > > > > 
> > > > > However, is it really guaranteed that one SCHED_IDLE thread
> > > > > cannot preempt another?  If not, then the trampoline-freeing
> > > > > SCHED_IDLE thread might preempt some other SCHED_IDLE thread
> > > > > in the middle of a trampoline. I am not seeing anything that
> > > > > prevents such preemption, but it is rather early local time,
> > > > > so I could easily be missing something.
> > > > > 
> > > > > However, if SCHED_IDLE threads cannot preempt other threads,
> > > > > even other SCHED_IDLE threads, then your approach sounds
> > > > > quite promising to me.
> > > > > 
> > > > > Steve, Peter, thoughts?    
> > > > 
> > > > SCHED_IDLE is the swapper task. There's one on each CPU, and
> > > > they don't migrate. And they only get called when there's no
> > > > other task running.  
> > > 
> > > Peter just "schooled" me on IRC. I stand corrected (and he may
> > > respond to this email too). I guess any task can become
> > > SCHED_IDLE.
> > > 
> > > But that just makes this an even less likely option for
> > > synchronize_rcu_tasks().  
> > 
> > Hmmm...  The goal is to make sure that any task that was preempted
> > or running at a given point in time passes through a voluntary
> > context switch (or userspace execution, or, ...).
> > 
> > What is the simplest way to get this job done?  To Ingo's point, I
> > bet that there is a simpler way than the current TASKS_RCU
> > implementation.
> > 
> > Ingo, if I make it fit into 100 lines of code, would you be OK with
> > it? I probably need a one-line hook at task-creation time and
> > another at task-exit time, if that makes a difference.  
> 
> And please see below for such a patch, which does add (just barely)
> fewer than 100 lines net.
> 
> Unfortunately, it does not work, as I should have known ahead of time
> from the dyntick-idle experience.  Not all context switches go through
> context_switch().  :-/
> 
> I believe this is fixable, more or less like dyntick-idle's
> half-interrupts were fixable, but it will likely be a few days.  Not
> clear whether the result will be simpler than current TASKS_RCU, but
> there is only one way to find out.  ;-)
> 


Hi Paul,

I'm currently traveling and don't have the energy to look at code at
the moment. Hopefully, I can look at this more on Thursday or Friday.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23  5:19                 ` Steven Rostedt
@ 2017-05-23 15:33                   ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-23 15:33 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Tue, May 23, 2017 at 01:19:58AM -0400, Steven Rostedt wrote:
> On Mon, 22 May 2017 17:00:36 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote:
> > > On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:  
> > > > On Fri, 19 May 2017 10:04:21 -0400
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >   
> > > > > On Fri, 19 May 2017 06:35:50 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >   
> > > > > > Simpler would be better!
> > > > > > 
> > > > > > However, is it really guaranteed that one SCHED_IDLE thread
> > > > > > cannot preempt another?  If not, then the trampoline-freeing
> > > > > > SCHED_IDLE thread might preempt some other SCHED_IDLE thread
> > > > > > in the middle of a trampoline. I am not seeing anything that
> > > > > > prevents such preemption, but it is rather early local time,
> > > > > > so I could easily be missing something.
> > > > > > 
> > > > > > However, if SCHED_IDLE threads cannot preempt other threads,
> > > > > > even other SCHED_IDLE threads, then your approach sounds
> > > > > > quite promising to me.
> > > > > > 
> > > > > > Steve, Peter, thoughts?    
> > > > > 
> > > > > SCHED_IDLE is the swapper task. There's one on each CPU, and
> > > > > they don't migrate. And they only get called when there's no
> > > > > other task running.  
> > > > 
> > > > Peter just "schooled" me on IRC. I stand corrected (and he may
> > > > respond to this email too). I guess any task can become
> > > > SCHED_IDLE.
> > > > 
> > > > But that just makes this an even less likely option for
> > > > synchronize_rcu_tasks().  
> > > 
> > > Hmmm...  The goal is to make sure that any task that was preempted
> > > or running at a given point in time passes through a voluntary
> > > context switch (or userspace execution, or, ...).
> > > 
> > > What is the simplest way to get this job done?  To Ingo's point, I
> > > bet that there is a simpler way than the current TASKS_RCU
> > > implementation.
> > > 
> > > Ingo, if I make it fit into 100 lines of code, would you be OK with
> > > it? I probably need a one-line hook at task-creation time and
> > > another at task-exit time, if that makes a difference.  
> > 
> > And please see below for such a patch, which does add (just barely)
> > fewer than 100 lines net.
> > 
> > Unfortunately, it does not work, as I should have known ahead of time
> > from the dyntick-idle experience.  Not all context switches go through
> > context_switch().  :-/
> > 
> > I believe this is fixable, more or less like dyntick-idle's
> > half-interrupts were fixable, but it will likely be a few days.  Not
> > clear whether the result will be simpler than current TASKS_RCU, but
> > there is only one way to find out.  ;-)
> 
> Hi Paul,
> 
> I'm currently traveling and don't have the energy to look at code at
> the moment. Hopefully, I can look at this more on Thursday or Friday.

Hello, Steven,

No problem, as I learned that my analogy to dyntick-idle's half-interrupts
fails.  If a given task exits an extended quiescent twice in a row, the
grace period was extended longer than it had to be, but no harm done.
The updated patch below handles this case.

But it turns out that tasks also enter extended quiescent states twice
in a row (from the perspective of context_switch()), which can result
in a too-short grace period.

So the only way that this general approach will work is to identify each
and every context-switch "sneak path" in each and every architecture,
which will be more complex than the current mainline RCU-tasks code.

So I am putting this on the back burner for the moment.  If anyone has
any ideas for a simple solution, please don't keep them a secret!

						Thanx, Paul

-----------------------------------------------------------------------

commit 71f0a84ceeb228789a09192901ddf3c7210c85ee
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon May 22 08:55:01 2017 -0700

    rcu: Provide simplified RCU-tasks implementation
    
    This commit provides a simplified RCU-tasks implementation in terms of
    SRCU for CONFIG_PREEMPT=y kernels and in terms of RCU-sched otherwise.
    The RCU-sched approach simply maps synchronize_rcu_tasks() directly onto
    synchronize_sched().
    
    The SRCU approach provides a tasks_srcu srcu_struct structure that
    tracks usermode execution and voluntary context switches.  When each
    task is created, it invokes tasks_rcu_qs_exit() and when each task exits
    it invokes tasks_rcu_qs_enter().  On context switch, tasks_rcu_qs_exit()
    is invoked for the incoming task, but only if that task is resuming from a
    voluntary context switch.  If the current context switch is a preemption,
    tasks_rcu_qs_enter() is invoked for the outgoing tasks, and either way
    the preempt-or-not state is recorded for the task's next resumption.
    A "context switch" that keeps the same task invokes tasks_rcu_qs().
    
    If the scheduling-clock interrupt sees a task in usermode, it invokes
    tasks_rcu_qs().  It is normally unsafe to do SRCU read-side operations
    from interrupt, but in this case, usermode was interrupted, so there is
    no chance of having interrupted another SRCU read-side operation.
    
    Given all this, synchronize_srcu(&tasks_srcu) waits for all tasks to
    be seen voluntarily blocked or in user space.  This means that
    synchronize_rcu_tasks() is a one-line function.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index bd11d33d48cc..d7700d969b8c 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -140,6 +140,14 @@ extern struct group_info init_groups;
 #else
 #define INIT_TASK_RCU_PREEMPT(tsk)
 #endif
+#ifdef CONFIG_TASKS_RCU
+#define INIT_TASK_RCU_TASKS(tsk)					\
+	.rcu_tasks_idx = 0,						\
+	.rcu_tasks_preempt = false,					\
+	.rcu_tasks_qs = false,
+#else /* #ifdef CONFIG_TASKS_RCU */
+#define INIT_TASK_RCU_TASKS(tsk)
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 extern struct cred init_cred;
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5088a44ce99c..1b2f104a467f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -156,6 +156,18 @@ static inline void rcu_init_nohz(void) { }
 		rcu_irq_exit_irqson(); \
 	} while (0)
 
+#ifdef CONFIG_TASKS_RCU
+void tasks_rcu_qs(bool preempt);
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt);
+void tasks_rcu_qs_exit(struct task_struct *t);
+void synchronize_rcu_tasks(void);
+#else /* #ifdef CONFIG_TASKS_RCU */
+static inline void tasks_rcu_qs(bool preempt) { }
+static inline void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) { }
+static inline void tasks_rcu_qs_exit(struct task_struct *t) { }
+static inline void synchronize_rcu_tasks(void) { synchronize_sched(); }
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
 /**
  * cond_resched_rcu_qs - Report potential quiescent states to RCU
  *
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b7d9aa80349..2c1862c9e7fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,12 @@ struct task_struct {
 	struct rcu_node			*rcu_blocked_node;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_TASKS_RCU
+	u8				rcu_tasks_idx;
+	u8				rcu_tasks_preempt;
+	u8				rcu_tasks_qs;
+#endif /* #ifdef CONFIG_TASKS_RCU */
+
 	struct sched_info		sched_info;
 
 	struct list_head		tasks;
diff --git a/kernel/fork.c b/kernel/fork.c
index 1784f49b0cd4..d1ec804eb4f9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1498,6 +1498,11 @@ static inline void rcu_copy_process(struct task_struct *p)
 	p->rcu_blocked_node = NULL;
 	INIT_LIST_HEAD(&p->rcu_node_entry);
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+	p->rcu_tasks_preempt = false;
+	p->rcu_tasks_qs = true;
+	tasks_rcu_qs_exit(p);
+#endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
 /*
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index be90c945063f..3077568930f3 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -69,13 +69,13 @@ config TREE_SRCU
 	  This option selects the full-fledged version of SRCU.
 
 config TASKS_RCU
-	bool
-	default n
+	def_bool PREEMPT_RCU
 	select SRCU
 	help
 	  This option enables a task-based RCU implementation that uses
-	  only voluntary context switch (not preemption!), idle, and
-	  user-mode execution as quiescent states.
+	  only voluntary context switch (not preemption!) and user-mode
+	  execution as quiescent states.  For !PREEMPT_RCU kernels,
+	  synchronize_tasks_rcu() maps to synchronize_sched().
 
 config RCU_STALL_COMMON
 	def_bool ( TREE_RCU || PREEMPT_RCU )
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..b6619ceedafc 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -16,7 +16,6 @@ config RCU_PERF_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs performance
@@ -33,7 +32,6 @@ config RCU_TORTURE_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs torture tests
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8008541b04f2..05e7abecde68 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -166,6 +166,7 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 static void rcu_report_exp_rdp(struct rcu_state *rsp,
 			       struct rcu_data *rdp, bool wake);
 static void sync_sched_exp_online_cleanup(int cpu);
+static void rcu_all_qs_ctxt(bool ctxt);
 
 /* rcuc/rcub kthread realtime priority */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
@@ -467,7 +468,7 @@ void rcu_note_context_switch(bool preempt)
 		rcu_momentary_dyntick_idle();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
 	if (!preempt)
-		rcu_all_qs();
+		rcu_all_qs_ctxt(true);
 out:
 	trace_rcu_utilization(TPS("End context switch"));
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
@@ -480,14 +481,13 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  * dyntick-idle quiescent state visible to other CPUs (but only for those
  * RCU flavors in desperate need of a quiescent state, which will normally
  * be none of them).  Either way, do a lightweight quiescent state for
- * all RCU flavors.
+ * all RCU flavors.  If ctxt is false, also to tasks-RCU quiescent state.
  *
  * The barrier() calls are redundant in the common case when this is
  * called externally, but just in case this is called from within this
  * file.
- *
  */
-void rcu_all_qs(void)
+static void rcu_all_qs_ctxt(bool ctxt)
 {
 	unsigned long flags;
 
@@ -509,9 +509,16 @@ void rcu_all_qs(void)
 	if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)))
 		rcu_sched_qs();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
+	if (!ctxt)
+		tasks_rcu_qs(false);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 	preempt_enable();
 }
+
+void rcu_all_qs(void)
+{
+	rcu_all_qs_ctxt(false);
+}
 EXPORT_SYMBOL_GPL(rcu_all_qs);
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d7325553cf82..454882ed03c5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -783,6 +783,7 @@ void exit_rcu(void)
 	barrier();
 	t->rcu_read_unlock_special.b.blocked = true;
 	__rcu_read_unlock();
+	tasks_rcu_qs_enter(t, false);
 }
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
@@ -2597,3 +2598,63 @@ static void rcu_bind_gp_kthread(void)
 		return;
 	housekeeping_affine(current);
 }
+
+#ifdef CONFIG_TASKS_RCU
+
+DEFINE_STATIC_SRCU(tasks_srcu);
+
+/* Record a momentary RCU-tasks quiescent state. */
+void tasks_rcu_qs(bool preempt)
+{
+	int idx;
+
+	if (preempt || is_idle_task(current))
+		return;  /* Preempted, not a quiescent state. */
+	/* Acquire new before releasing old to allow tracing SRCU. */
+	preempt_disable();
+	idx = __srcu_read_lock(&tasks_srcu);
+	__srcu_read_unlock(&tasks_srcu, current->rcu_tasks_idx);
+	current->rcu_tasks_idx = idx;
+	WARN_ON_ONCE(current->rcu_tasks_qs);
+	current->rcu_tasks_qs = false;
+	preempt_enable();
+}
+
+/* Record entering an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt)
+{
+	t->rcu_tasks_preempt = preempt;
+	if (preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	__srcu_read_unlock(&tasks_srcu, t->rcu_tasks_idx);
+	WARN_ON_ONCE(current->rcu_tasks_qs);
+	current->rcu_tasks_qs = true;
+	preempt_enable();
+}
+
+/* Record exiting an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_exit(struct task_struct *t)
+{
+	if (t->rcu_tasks_preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	if (current->rcu_tasks_qs)
+		t->rcu_tasks_idx = __srcu_read_lock(&tasks_srcu);
+	current->rcu_tasks_qs = false;
+	preempt_enable();
+}
+
+/**
+ * synchronize_rcu_tasks - Wait for all tasks to voluntarily leave the kernel
+ *
+ * Waits until each task is either voluntarily blocked, running in
+ * userspace, or has offered to block (as in cond_resched_rcu_qs()).
+ * Idle tasks are ignored.
+ */
+void synchronize_rcu_tasks(void)
+{
+	synchronize_srcu(&tasks_srcu);
+}
+
+#endif /* #ifdef CONFIG_TASKS_RCU */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75353aa53ae3..1ba79734377f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3430,6 +3430,7 @@ static void __sched notrace __schedule(bool preempt)
 	clear_preempt_need_resched();
 
 	if (likely(prev != next)) {
+		tasks_rcu_qs_exit(next);
 		rq->nr_switches++;
 		rq->curr = next;
 		++*switch_count;
@@ -3438,9 +3439,11 @@ static void __sched notrace __schedule(bool preempt)
 
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
+		tasks_rcu_qs_enter(prev, preempt);
 	} else {
 		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unlock_irq(rq, &rf);
+		tasks_rcu_qs(preempt);
 	}
 
 	balance_callback(rq);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23  0:00               ` Paul E. McKenney
  2017-05-23  5:19                 ` Steven Rostedt
@ 2017-05-23 19:39                 ` Steven Rostedt
  2017-05-23 20:00                   ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-23 19:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Mon, 22 May 2017 17:00:36 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > 
> > Hmmm...  The goal is to make sure that any task that was preempted
> > or running at a given point in time passes through a voluntary
> > context switch (or userspace execution, or, ...).
> > 
> > What is the simplest way to get this job done?  To Ingo's point, I
> > bet that there is a simpler way than the current TASKS_RCU
> > implementation.
> > 
> > Ingo, if I make it fit into 100 lines of code, would you be OK with
> > it? I probably need a one-line hook at task-creation time and
> > another at task-exit time, if that makes a difference.  
> 
> And please see below for such a patch, which does add (just barely)
> fewer than 100 lines net.
> 
> Unfortunately, it does not work, as I should have known ahead of time
> from the dyntick-idle experience.  Not all context switches go through
> context_switch().  :-/

Wait. What context switch doesn't go through a context switch? Or do
you mean a user/kernel context switch?

-- Steve

> 
> I believe this is fixable, more or less like dyntick-idle's
> half-interrupts were fixable, but it will likely be a few days.  Not
> clear whether the result will be simpler than current TASKS_RCU, but
> there is only one way to find out.  ;-)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23 19:39                 ` Steven Rostedt
@ 2017-05-23 20:00                   ` Paul E. McKenney
  2017-05-23 20:38                     ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-23 20:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Tue, May 23, 2017 at 03:39:39PM -0400, Steven Rostedt wrote:
> On Mon, 22 May 2017 17:00:36 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > 
> > > Hmmm...  The goal is to make sure that any task that was preempted
> > > or running at a given point in time passes through a voluntary
> > > context switch (or userspace execution, or, ...).
> > > 
> > > What is the simplest way to get this job done?  To Ingo's point, I
> > > bet that there is a simpler way than the current TASKS_RCU
> > > implementation.
> > > 
> > > Ingo, if I make it fit into 100 lines of code, would you be OK with
> > > it? I probably need a one-line hook at task-creation time and
> > > another at task-exit time, if that makes a difference.  
> > 
> > And please see below for such a patch, which does add (just barely)
> > fewer than 100 lines net.
> > 
> > Unfortunately, it does not work, as I should have known ahead of time
> > from the dyntick-idle experience.  Not all context switches go through
> > context_switch().  :-/
> 
> Wait. What context switch doesn't go through a context switch? Or do
> you mean a user/kernel context switch?

I mean that putting printk() before and after the call to context_switch()
can show tasks switching out twice without switching in and vice versa.
No sign of lost printk()s, and I also confirmed this behavior using a
flag in task_struct.

One way that this can happen on some architectures is via the "helper"
mechanism, where the task sleeps normally, but where a later interrupt
or exception takes on its context "behind the scenes" in the arch code.
This is what messed up my attempt to use a simple interrupt-nesting
counter for RCU dynticks some years back.  What I counted on there was
that the idle loop would never do that sort of thing, so I could zero
the count when entering idle from process context.

But I have not yet found a similar trick for counting voluntary
context switches.

I also tried making context_switch() look like a momentary quiescent
state, but of course that means that tasks that block forever also
block the grace period forever.  At which point, I need to scan the task
list to find them.  And that pretty much brings me back to the current
RCU-tasks implementation.  :-/

							Thanx, Paul

> -- Steve
> 
> > 
> > I believe this is fixable, more or less like dyntick-idle's
> > half-interrupts were fixable, but it will likely be a few days.  Not
> > clear whether the result will be simpler than current TASKS_RCU, but
> > there is only one way to find out.  ;-)
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23 20:00                   ` Paul E. McKenney
@ 2017-05-23 20:38                     ` Steven Rostedt
  2017-05-23 21:10                       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2017-05-23 20:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Tue, 23 May 2017 13:00:35 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

 
> > > Unfortunately, it does not work, as I should have known ahead of
> > > time from the dyntick-idle experience.  Not all context switches
> > > go through context_switch().  :-/  
> > 
> > Wait. What context switch doesn't go through a context switch? Or do
> > you mean a user/kernel context switch?  
> 
> I mean that putting printk() before and after the call to
> context_switch() can show tasks switching out twice without switching
> in and vice versa. No sign of lost printk()s, and I also confirmed
> this behavior using a flag in task_struct.

I hope you meant trace_printk()s' as printk is a huge overhead and can
cause side effects.

> 
> One way that this can happen on some architectures is via the "helper"
> mechanism, where the task sleeps normally, but where a later interrupt
> or exception takes on its context "behind the scenes" in the arch
> code. This is what messed up my attempt to use a simple
> interrupt-nesting counter for RCU dynticks some years back.  What I
> counted on there was that the idle loop would never do that sort of
> thing, so I could zero the count when entering idle from process
> context.
> 
> But I have not yet found a similar trick for counting voluntary
> context switches.
> 
> I also tried making context_switch() look like a momentary quiescent
> state, but of course that means that tasks that block forever also
> block the grace period forever.  At which point, I need to scan the
> task list to find them.  And that pretty much brings me back to the
> current RCU-tasks implementation.  :-/
> 

Nothing should block in a preempted state forever, and if it does, that
means we want to wait forever. Because it could be preempted on the
trampoline.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-23 20:38                     ` Steven Rostedt
@ 2017-05-23 21:10                       ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2017-05-23 21:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner

On Tue, May 23, 2017 at 04:38:53PM -0400, Steven Rostedt wrote:
> On Tue, 23 May 2017 13:00:35 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> 
> > > > Unfortunately, it does not work, as I should have known ahead of
> > > > time from the dyntick-idle experience.  Not all context switches
> > > > go through context_switch().  :-/  
> > > 
> > > Wait. What context switch doesn't go through a context switch? Or do
> > > you mean a user/kernel context switch?  
> > 
> > I mean that putting printk() before and after the call to
> > context_switch() can show tasks switching out twice without switching
> > in and vice versa. No sign of lost printk()s, and I also confirmed
> > this behavior using a flag in task_struct.
> 
> I hope you meant trace_printk()s' as printk is a huge overhead and can
> cause side effects.

Not so much during boot.  But actually, I meant to ask you about that...

>From what I can see from the ftrace documentation, booting with something
like this:

ftrace=function ftrace_filter=tasks_rcu_qs,tasks_rcu_qs_enter,tasks_rcu_qs_exit

Should enable ftrace, but only on the three functions called out.
But when I try this, I get the following in dmesg:

[    1.506171] ftrace bootup tracer 'function' not registered

And I don't get anything from ftrace_dump() later on.

What am I doing wrong here?  (Event tracing has worked for me in the
past from the boot line, but I was lazy so just fell back to printk().
And I didn't think of trace_printk().)

> > One way that this can happen on some architectures is via the "helper"
> > mechanism, where the task sleeps normally, but where a later interrupt
> > or exception takes on its context "behind the scenes" in the arch
> > code. This is what messed up my attempt to use a simple
> > interrupt-nesting counter for RCU dynticks some years back.  What I
> > counted on there was that the idle loop would never do that sort of
> > thing, so I could zero the count when entering idle from process
> > context.
> > 
> > But I have not yet found a similar trick for counting voluntary
> > context switches.
> > 
> > I also tried making context_switch() look like a momentary quiescent
> > state, but of course that means that tasks that block forever also
> > block the grace period forever.  At which point, I need to scan the
> > task list to find them.  And that pretty much brings me back to the
> > current RCU-tasks implementation.  :-/
> 
> Nothing should block in a preempted state forever, and if it does, that
> means we want to wait forever. Because it could be preempted on the
> trampoline.

Blocking in a preempted state is not the problem here.  Given that
the obvious hooks don't seem to be catching all of the switch-to and
switch-from events, blocking forever in a not-preempted state is
the problem.  I either need some way to see all of the switch-from
and switch-to events (and the ways I can see to do this have patch-size
and maintainability issues), or I need to go back to scanning the
task list.

And of course, all of the approaches that update state upon context
switch are slowing down a fastpath for the benefit of a slowpath,
which is not necessarily all that good of a thing.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Use case for TASKS_RCU
  2017-05-16 13:07     ` Steven Rostedt
@ 2017-05-24  9:37       ` Masami Hiramatsu
  0 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2017-05-24  9:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Ingo Molnar, linux-kernel, Masami Hiramatsu

Sorry, I missed this thread,

On Tue, 16 May 2017 09:07:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 16 May 2017 05:23:54 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote:
> > > 
> > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > Hello!
> > > > 
> > > > The question of the use case for TASKS_RCU came up, and here is my
> > > > understanding.  Steve will not be shy about correcting any misconceptions
> > > > I might have.  ;-)
> > > > 
> > > > The use case is to support freeing of trampolines used in tracing/probing
> > > > in CONFIG_PREEMPT=y kernels.  It is necessary to wait until any task
> > > > executing in the trampoline in question has left it, taking into account
> > > > that the trampoline's code might be interrupted and preempted.  However,
> > > > the code in the trampolines is guaranteed never to context switch.
> > > > 
> > > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices.
> > > > It is therefore tempting to think in terms of disabling preemption across
> > > > the trampolines, but there is apparently not enough room to accommodate
> > > > the needed preempt_disable() and preempt_enable() in the code invoking
> > > > the trampoline, and putting the preempt_disable() and preempt_enable()
> > > > in the trampoline itself fails because of the possibility of preemption
> > > > just before the preempt_disable() and just after the preempt_enable().
> > > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock().  
> > > 
> > > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at 
> > > the moment, so it's hard for me to review its users. What am I missing?  
> > 
> > Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y.
> 
> Actually, things simply were not implemented. This is why optimized
> kprobes is dependent on !CONFIG_PREEMPT. In fact, we can now optimize
> kprobes on CONFIG_PREEMPT with this utility. Right Masami?

Yes, I just haven't implemented it. OK, I'll use synchronize_rcu_tasks.

> With ftrace, perf and other "dynamic" users (where the ftrace_ops was
> created via a kmalloc), would not get the benefit of being called
> directly. They all needed to have their mcount/fentry's call a static
> trampoline that disabled preemption before calling the callback. This
> static trampoline is shared by all, so even if perf was the only
> callback for the function, it had to call this trampoline that iterated
> through all registered ftrace_ops to see which one had a callback for
> the given function.

For the optimized kprobes, it always jumps into dynamically allocated
trampoline, so we have no chance to disable preemption.

Thank you,

> 
> With this utility, perf not only gets the benefit of not having to use
> that static loop trampoline, it can even have its own trampoline
> created that doesn't even need to do the check if perf wants this
> function or not, as the only way the trampoline is called, is if perf
> wanted it.
> 
> > 
> > Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks().
> > So yes, currently one user.
> > 
> 
> And the kpatch folks want to use it too.
> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-05-24  9:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 18:23 Use case for TASKS_RCU Paul E. McKenney
2017-05-15 18:48 ` Steven Rostedt
2017-05-15 20:12   ` Paul E. McKenney
2017-05-16  6:22 ` Ingo Molnar
2017-05-16 12:23   ` Paul E. McKenney
2017-05-16 13:07     ` Steven Rostedt
2017-05-24  9:37       ` Masami Hiramatsu
2017-05-19  6:23     ` Ingo Molnar
2017-05-19 13:35       ` Paul E. McKenney
2017-05-19 14:04         ` Steven Rostedt
2017-05-19 14:23           ` Steven Rostedt
2017-05-19 19:06             ` Paul E. McKenney
2017-05-23  0:00               ` Paul E. McKenney
2017-05-23  5:19                 ` Steven Rostedt
2017-05-23 15:33                   ` Paul E. McKenney
2017-05-23 19:39                 ` Steven Rostedt
2017-05-23 20:00                   ` Paul E. McKenney
2017-05-23 20:38                     ` Steven Rostedt
2017-05-23 21:10                       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).