bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rostedt <rostedt@goodmis.org>
Cc: Michael Jeanson <mjeanson@efficios.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Alexei Starovoitov <ast@kernel.org>, Yonghong Song <yhs@fb.com>,
	paulmck <paulmck@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	acme <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [RFC PATCH 0/6] [RFC] Faultable tracepoints (v2)
Date: Thu, 25 Feb 2021 16:46:30 -0500 (EST)	[thread overview]
Message-ID: <1130245502.6977.1614289590089.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20210224131405.20d64b49@gandalf.local.home>



----- On Feb 24, 2021, at 1:14 PM, rostedt rostedt@goodmis.org wrote:

> On Wed, 24 Feb 2021 11:59:35 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> 
>> As a prototype solution, what I've done currently is to copy the user-space
>> data into a kmalloc'd buffer in a preparation step before disabling preemption
>> and copying data over into the per-cpu buffers. It works, but I think we should
>> be able to do it without the needless copy.
>> 
>> What I have in mind as an efficient solution (not implemented yet) for the LTTng
>> kernel tracer goes as follows:
>> 
>> #define COMMIT_LOCAL 0
>> #define COMMIT_REMOTE 1
>> 
>> - faultable probe is called from system call tracepoint [
>> preemption/blocking/migration is allowed ]
>>   - probe code calculate the length which needs to be reserved to store the event
>>     (e.g. user strlen),
>> 
>>   - preempt disable -> [ preemption/blocking/migration is not allowed from here ]
>>     - reserve_cpu = smp_processor_id()
>>     - reserve space in the ring buffer for reserve_cpu
>>       [ from that point on, we have _exclusive_ access to write into the ring buffer
>>       "slot"
>>         from any cpu until we commit. ]
>>   - preempt enable -> [ preemption/blocking/migration is allowed from here ]
>> 
> 
> So basically the commit position here doesn't move until this task is
> scheduled back in and the commit (remote or local) is updated.

Indeed.

> To put it in terms of the ftrace ring buffer, where we have both a commit
> page and a commit index, and it only gets moved by the first one to start a
> commit stack (that is, interrupts that interrupted a write will not
> increment the commit).

The tricky part for ftrace is its reliance on the fact that the concurrent
users of the per-cpu ring buffer are all nested contexts. LTTng does not
assume that and has been designed to be used both in kernel and user-space:
lttng-modules and lttng-ust share a lot of ring buffer code. Therefore,
LTTng's ring buffer supports preemption/migration of concurrent contexts.

The fact that LTTng uses local-atomic-ops on its kernel ring buffers is just
an optimization on an overall ring buffer design meant to allow preemption.

> Now, I'm not sure how LTTng does it, but I could see issues for ftrace to
> try to move the commit pointer (the pointer to the new commit page), as the
> design is currently dependent on the fact that it can't happen while
> commits are taken place.

Indeed, what makes it easy for LTTng is because the ring buffer has been
designed to support preemption/migration from the ground up.

> Are the pages of the LTTng indexed by an array of pages?

Yes, they are. Handling the initial page allocation and then the tracer copy of data
to/from the ring buffer pages is the responsibility of the LTTng lib ring buffer "backend".
The LTTng lib ring buffer backend is somewhat similar to a page table done in software, where
the top level of the page table can be dynamically updated when doing flight recorder tracing.

It is however completely separate from the space reservation/commit scheme which is handled
by the lib ring buffer "frontend".

The algorithm I described in my prior email is specifically targeted at the frontend layer,
leaving the "backend" unchanged.

For some reasons I suspect Ftrace ring buffer combined those two layers into a single
algorithm, which may have its advantages, but seems to strengthen its dependency on
only having nested contexts sharing a given per-cpu ring buffer.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2021-02-25 21:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18 22:21 [RFC PATCH 0/6] [RFC] Faultable tracepoints (v2) Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 1/6] tracing: introduce faultable " Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 2/6] tracing: ftrace: add support for faultable tracepoints Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 3/6] tracing: bpf-trace: " Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 4/6] tracing: perf: " Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 5/6] tracing: convert sys_enter/exit to " Michael Jeanson
2021-02-18 22:21 ` [RFC PATCH 6/6] tracing: use Tasks Trace RCU instead of SRCU for rcuidle tracepoints Michael Jeanson
2021-02-24  2:16 ` [RFC PATCH 0/6] [RFC] Faultable tracepoints (v2) Steven Rostedt
2021-02-24 16:22   ` Michael Jeanson
2021-02-24 16:59     ` Mathieu Desnoyers
2021-02-24 18:14       ` Steven Rostedt
2021-02-25 21:46         ` Mathieu Desnoyers [this message]
2021-02-24 23:54       ` Mathieu Desnoyers
2021-02-26  5:28       ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1130245502.6977.1614289590089.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mjeanson@efficios.com \
    --cc=namhyung@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).