All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: paulmck@linux.vnet.ibm.com
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Jiri Olsa <jolsa@redhat.com>
Subject: Re: Re: [RFC][PATCH] ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched()
Date: Thu, 19 Jun 2014 16:18:37 +0900	[thread overview]
Message-ID: <53A28ECD.7000503@hitachi.com> (raw)
In-Reply-To: <20140619022801.GC4669@linux.vnet.ibm.com>

(2014/06/19 11:28), Paul E. McKenney wrote:
> On Wed, Jun 18, 2014 at 09:56:26PM -0400, Steven Rostedt wrote:
>>
>> Another blast from the past (from the book of cleaning out inbox)
>>
>> On Wed, 29 May 2013 09:52:49 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> On Tue, May 28, 2013 at 08:01:16PM -0400, Steven Rostedt wrote:
>>>> The function tracer uses preempt_disable/enable_notrace() for
>>>> synchronization between reading registered ftrace_ops and unregistering
>>>> them.
>>>>
>>>> Most of the ftrace_ops are global permanent structures that do not
>>>> require this synchronization. That is, ops may be added and removed from
>>>> the hlist but are never freed, and wont hurt if a synchronization is
>>>> missed.
>>>>
>>>> But this is not true for dynamically created ftrace_ops or control_ops,
>>>> which are used by the perf function tracing.
>>>>
>>>> The problem here is that the function tracer can be used to trace
>>>> kernel/user context switches as well as going to and from idle.
>>>> Basically, it can be used to trace blind spots of the RCU subsystem.
>>>> This means that even though preempt_disable() is done, a
>>>> synchronize_sched() will ignore CPUs that haven't made it out of user
>>>> space or idle. These can include functions that are being traced just
>>>> before entering or exiting the kernel sections.
>>>
>>> Just to be clear, its the idle part that's a problem, right? Being stuck
>>> in userspace isn't a problem since if that CPU is in userspace its
>>> certainly not got a reference to whatever list entry we're removing.
>>>
>>> Now when the CPU really is idle, its obviously not using tracing either;
>>> so only the gray area where RCU thinks we're idle but we're not actually
>>> idle is a problem?
>>>
>>> Is there something a little smarter we can do? Could we use
>>> on_each_cpu_cond() with a function that checks if the CPU really is
>>> fully idle?
>>>
>>>> To implement the RCU synchronization, instead of using
>>>> synchronize_sched() the use of schedule_on_each_cpu() is performed. This
>>>> means that when a dynamically allocated ftrace_ops, or a control ops is
>>>> being unregistered, all CPUs must be touched and execute a ftrace_sync()
>>>> stub function via the work queues. This will rip CPUs out from idle or
>>>> in dynamic tick mode. This only happens when a user disables perf
>>>> function tracing or other dynamically allocated function tracers, but it
>>>> allows us to continue to debug RCU and context tracking with function
>>>> tracing.
>>>
>>> I don't suppose there's anything perf can do to about this right? Since
>>> its all on user demand we're kinda stuck with dynamic memory.
>>
>> If Paul finished his "synchronize_all_tasks_scheduled()" then we could
>> use that instead. Where "synchornize_all_tasks_scheduled()" would
>> return after all tasks have either scheduled, in userspace, or idle
>> (that is, not on the run queue). And scheduled means a non preempted
>> schedule, where the task itself actually called schedule.

This is also good for jump-optimized kprobes on preemptive kernel.
Since there is no way to ensure that no one is running on an dynamically
allocated code buffer, the jump optimization is currently disabled
with CONFIG_PREEMPT. However, this new API can allow us to enable it.

>>
>> Paul, how you doing on that? You said you could have something by 3.17.
>> That's coming up quick :-)
> 
> I am still expecting to, depite my misadventures with performance
> regressions.

Nice. I look forward to that :-)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



  reply	other threads:[~2014-06-19  7:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-29  0:01 [RFC][PATCH] ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched() Steven Rostedt
2013-05-29  7:52 ` Peter Zijlstra
2013-05-29 13:33   ` Paul E. McKenney
2013-05-29 13:55     ` Steven Rostedt
2013-05-29 13:41   ` Steven Rostedt
2014-06-19  1:56   ` Steven Rostedt
2014-06-19  2:28     ` Paul E. McKenney
2014-06-19  7:18       ` Masami Hiramatsu [this message]
2013-05-29  8:23 ` Paul E. McKenney
2013-06-04 11:03 ` Frederic Weisbecker
2013-06-04 12:11   ` Steven Rostedt
2013-06-04 12:30     ` Frederic Weisbecker
2013-06-05 11:51 ` Peter Zijlstra
2013-06-05 13:36   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A28ECD.7000503@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=fweisbec@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.