linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu
@ 2020-09-07  1:33 Davidlohr Bueso
  2020-09-07 11:43 ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Davidlohr Bueso @ 2020-09-07  1:33 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, oleg, linux-kernel, dave, Davidlohr Bueso

It seems that alloc_retstack_tasklist() can also take a lockless
approach for scanning the tasklist, instead of using the big global
tasklist_lock. For this we also kill another deprecated and rcu-unsafe
tsk->thread_group user replacing it with for_each_process_thread(),
maintaining semantics.

Here tasklist_lock does not protect anything other than the list
against concurrent fork/exit. And considering that the whole thing
is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
problem to have a pontentially stale, yet stable, list. The task cannot
go away either, so we don't risk racing with ftrace_graph_exit_task()
which clears the retstack.

The tsk->ret_stack management is not protected by tasklist_lock, being
serialized with the corresponding publish/subscribe barriers against
concurrent ftrace_push_return_trace(). In addition this plays nicer
with cachelines by avoiding two atomic ops in the uncontended case.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/trace/fgraph.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 1af321dec0f1..5658f13037b3 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 		}
 	}
 
-	read_lock(&tasklist_lock);
-	do_each_thread(g, t) {
+	rcu_read_lock();
+	for_each_process_thread(g, t) {
 		if (start == end) {
 			ret = -EAGAIN;
 			goto unlock;
@@ -403,10 +403,10 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 			smp_wmb();
 			t->ret_stack = ret_stack_list[start++];
 		}
-	} while_each_thread(g, t);
+	}
 
 unlock:
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 free:
 	for (i = start; i < end; i++)
 		kfree(ret_stack_list[i]);
--
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu
  2020-09-07  1:33 [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu Davidlohr Bueso
@ 2020-09-07 11:43 ` Oleg Nesterov
  2020-09-18 17:12   ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2020-09-07 11:43 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: rostedt, mingo, linux-kernel, Davidlohr Bueso

On 09/06, Davidlohr Bueso wrote:
>
> Here tasklist_lock does not protect anything other than the list
> against concurrent fork/exit. And considering that the whole thing
> is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
> problem to have a pontentially stale, yet stable, list. The task cannot
> go away either, so we don't risk racing with ftrace_graph_exit_task()
> which clears the retstack.

I don't understand this code but I think you right, tasklist_lock buys
nothing.

Afaics, with or without this change alloc_retstack_tasklist() can race
with copy_process() and miss the new child; ftrace_graph_init_task()
can't help, ftrace_graph_active can be set right after the check and
for_each_process_thread() can't see the new process yet.

This can't race with ftrace_graph_exit_task(), it is called after the
full gp pass. But this function looks very confusing to me, I don't
understand the barrier and the "NULL must become visible to IRQs before
we free it" comment.

Looks like, ftrace_graph_exit_task() was called by the exiting task
in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
free the return stack on free_task()"). I think it makes sense to
simplify this function now, it can simply do kfree(t->ret_stack) and
nothing more.

ACK, but ...

> @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
>  		}
>  	}
>  
> -	read_lock(&tasklist_lock);

then you should probably rename alloc_retstack_tasklist() ?

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu
  2020-09-07 11:43 ` Oleg Nesterov
@ 2020-09-18 17:12   ` Steven Rostedt
  2020-09-19 11:24     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2020-09-18 17:12 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Davidlohr Bueso, mingo, linux-kernel, Davidlohr Bueso

[ Back from my PTO and still digging out emails ]

On Mon, 7 Sep 2020 13:43:02 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 09/06, Davidlohr Bueso wrote:
> >
> > Here tasklist_lock does not protect anything other than the list
> > against concurrent fork/exit. And considering that the whole thing
> > is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
> > problem to have a pontentially stale, yet stable, list. The task cannot
> > go away either, so we don't risk racing with ftrace_graph_exit_task()
> > which clears the retstack.  
> 
> I don't understand this code but I think you right, tasklist_lock buys
> nothing.

When I first wrote this code, I didn't want to take tasklist_lock, but
there was questions if rcu_read_lock() was enough. And since this code is
far from a fast path, I decided it was better to be safe than sorry, and
took the tasklist_lock as a paranoid measure.

> 
> Afaics, with or without this change alloc_retstack_tasklist() can race
> with copy_process() and miss the new child; ftrace_graph_init_task()
> can't help, ftrace_graph_active can be set right after the check and
> for_each_process_thread() can't see the new process yet.

There's a call in copy_process(): ftrace_graph_init_task() that initializes
a new tasks ret_stack, and this loop will ignore it because it first checks
to see if the task has a ret_stack before adding one to it. And the child
gets one before being added to the list.

> 
> This can't race with ftrace_graph_exit_task(), it is called after the
> full gp pass. But this function looks very confusing to me, I don't
> understand the barrier and the "NULL must become visible to IRQs before
> we free it" comment.

Probably not needed, but again, being very paranoid, as to not crash
anything. If this is called on a task that is running, and an interrupt
comes in after it is freed, but before the ret_stack variable is set to
NULL, then it will try to use it. I don't think this is possible, but it
may have been in the past.

> 
> Looks like, ftrace_graph_exit_task() was called by the exiting task
> in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
> free the return stack on free_task()"). I think it makes sense to
> simplify this function now, it can simply do kfree(t->ret_stack) and
> nothing more.

Ah, yeah, then you are right. If it can't be called on a running task then
it can be simplified. Probably need a:

 WARN_ON_ONCE(t->on_rq);

just to make sure this never happens.

> 
> ACK, but ...
> 
> > @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
> >  		}
> >  	}
> >  
> > -	read_lock(&tasklist_lock);  
> 
> then you should probably rename alloc_retstack_tasklist() ?
> 

tasklist, process thead? Is there a difference?

Thanks for reviewing this!

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu
  2020-09-18 17:12   ` Steven Rostedt
@ 2020-09-19 11:24     ` Oleg Nesterov
  2020-09-19 13:56       ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2020-09-19 11:24 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Davidlohr Bueso, mingo, linux-kernel, Davidlohr Bueso

On 09/18, Steven Rostedt wrote:
>
> On Mon, 7 Sep 2020 13:43:02 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Afaics, with or without this change alloc_retstack_tasklist() can race
> > with copy_process() and miss the new child; ftrace_graph_init_task()
> > can't help, ftrace_graph_active can be set right after the check and
> > for_each_process_thread() can't see the new process yet.
>
> There's a call in copy_process(): ftrace_graph_init_task() that initializes
> a new tasks ret_stack,

Only if ftrace_graph_active != 0.

register_ftrace_graph() can increment ftrace_graph_active and call
alloc_retstack_tasklist() right after ftrace_graph_init_task() checks
ftrace_graph_active.

> and this loop will ignore it

and this loop won't see it unless the forking process finishes copy_process()
and does list_add_tail_rcu(&p->tasks, &init_task.tasks) which makes it
visible to for_each_process(). Yes, this is very unlikely.

> > Looks like, ftrace_graph_exit_task() was called by the exiting task
> > in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
> > free the return stack on free_task()"). I think it makes sense to
> > simplify this function now, it can simply do kfree(t->ret_stack) and
> > nothing more.
>
> Ah, yeah, then you are right. If it can't be called on a running task then
> it can be simplified. Probably need a:
>
>  WARN_ON_ONCE(t->on_rq);
>
> just to make sure this never happens.

Well, ftrace_graph_exit_task(t) is called by free_task(t), right before
kmem_cache_free(t).

> > ACK, but ...
> >
> > > @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
> > >  		}
> > >  	}
> > >
> > > -	read_lock(&tasklist_lock);
> >
> > then you should probably rename alloc_retstack_tasklist() ?
> >
>
> tasklist, process thead? Is there a difference?

Aah, please ignore. Somehow I misinterpreted the _tasklist suffix, as if it
refers to tasklist_lock.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu
  2020-09-19 11:24     ` Oleg Nesterov
@ 2020-09-19 13:56       ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2020-09-19 13:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Davidlohr Bueso, mingo, linux-kernel, Davidlohr Bueso

On Sat, 19 Sep 2020 13:24:39 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 09/18, Steven Rostedt wrote:
> >
> > On Mon, 7 Sep 2020 13:43:02 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >  
> > > Afaics, with or without this change alloc_retstack_tasklist() can race
> > > with copy_process() and miss the new child; ftrace_graph_init_task()
> > > can't help, ftrace_graph_active can be set right after the check and
> > > for_each_process_thread() can't see the new process yet.  
> >
> > There's a call in copy_process(): ftrace_graph_init_task() that initializes
> > a new tasks ret_stack,  
> 
> Only if ftrace_graph_active != 0.
> 
> register_ftrace_graph() can increment ftrace_graph_active and call
> alloc_retstack_tasklist() right after ftrace_graph_init_task() checks
> ftrace_graph_active.
> 
> > and this loop will ignore it  
> 
> and this loop won't see it unless the forking process finishes copy_process()
> and does list_add_tail_rcu(&p->tasks, &init_task.tasks) which makes it
> visible to for_each_process(). Yes, this is very unlikely.

Ah, I see what you mean. Hmm, not sure the best way to fix this. It
would be very rare to trigger and the only thing that it would do is
not to trace the new task. But that could be frustrating if that
happens. I guess we could put the hook after it gets added to the list,
and use a cmpxchg to update the ret_stack, to make sure we don't race
on initialization and copy_process.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-19 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07  1:33 [PATCH] fgraph: Convert ret_stack tasklist scanning to rcu Davidlohr Bueso
2020-09-07 11:43 ` Oleg Nesterov
2020-09-18 17:12   ` Steven Rostedt
2020-09-19 11:24     ` Oleg Nesterov
2020-09-19 13:56       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).