Re: [lttng-dev] Unexport of kvm_x86_ops vs tracer modules

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	lttng-dev <lttng-dev@lists.lttng.org>,
	rostedt <rostedt@goodmis.org>, KVM list <kvm@vger.kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [lttng-dev] Unexport of kvm_x86_ops vs tracer modules
Date: Mon, 25 Apr 2022 09:00:22 -0400 (EDT)	[thread overview]
Message-ID: <358746537.34457.1650891622938.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1622857974.11247.1649441213797.JavaMail.zimbra@efficios.com>

----- On Apr 8, 2022, at 2:06 PM, Mathieu Desnoyers via lttng-dev lttng-dev@lists.lttng.org wrote:

> ----- On Apr 8, 2022, at 12:24 PM, Paolo Bonzini pbonzini@redhat.com wrote:
> 
>> On 4/8/22 17:36, Mathieu Desnoyers wrote:
>>> LTTng is an out of tree kernel module, which currently relies on the export.
>>> Indeed, arch/x86/kvm/x86.c exports a set of tracepoints to kernel modules, e.g.:
>>> 
>>> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry)
>>> 
>>> But any probe implementation hooking on that tracepoint would need kvm_x86_ops
>>> to translate the struct kvm_vcpu * into meaningful tracing data.
>>> 
>>> I could work-around this on my side in ugly ways, but I would like to discuss
>>> how kernel module tracers are expected to implement kvm events probes without
>>> the kvm_x86_ops symbol ?
>> 
>> The conversion is done in the TP_fast_assign snippets, which are part of
>> kvm.ko and therefore do not need the export.  As I understand it, the
>> issue is that LTTng cannot use the TP_fast_assign snippets, because they
>> are embedded in the trace_event_raw_event_* symbols?
> 
> Indeed, the fact that the TP_fast_assign snippets are embedded in the
> trace_event_raw_event_* symbols is an issue for LTTng. This ties those
> to ftrace.
> 
> AFAIK, TP_fast_assign copies directly into ftrace ring buffers, and then
> afterwards things like dynamic filters are applied, which then "uncommits" the
> events if need be (and if possible). Also, TP_fast_assign is tied to the
> ftrace ring buffer event layout. The fact that the TP_STRUCT__entry()
> (description)
> and TP_fast_assign() (open-coded C) are separate fields really focuses on a
> use-case where all data is serialized to a ring buffer.
> 
> In LTTng, the event fields are made available to a filter interpreter prior to
> being copied into LTTng's ring buffer. This is made possible by implementing
> our own LTTNG_TRACEPOINT_EVENT code generation headers. In addition, we have
> recently released an event notification mechanism (lttng 2.13) which captures
> specific event fields to send with an immediate notification (thus bypassing the
> tracer buffering). We are also currently working on a LTTng trace hit counters
> mechanism, which performs aggregation through per-cpu counters, which doesn't
> even allocate a ring buffer.
> 
> For those reasons, LTTng reimplements its own tracepoint probe callbacks. All
> those sit within LTTng kernel modules, which means we currently need the
> exported
> kvm_x86_ops callbacks.
> 
>> We cannot do the extraction before calling trace_kvm_exit, because it's
>> expensive.
> 
> I suspect that extracting relevant data prior to calling trace_kvm_exit
> is too expensive because it cannot be skipped when the tracepoint is
> disabled. This is because trace_kvm_exit() is a static inline function,
> and the check to figure out if the event is enabled is within that function.
> Unfortunately, even if the tracepoint is disabled, the side-effects of the
> parameters passed to trace_kvm_exit() must happen.
> 
> I've solved this in LTTng-UST by implementing a lttng_ust_tracepoint()
> macro, which basically "lifts" the tracepoint enabled check before the
> evaluation of the arguments.
> 
> You could achieve something similar by using trace_kvm_exit_enabled() in the
> kernel like so:
> 
>  if (trace_kvm_exit_enabled())
>      trace_kvm_exit(....);
> 
> Which would skip evaluation of the argument side-effects when the tracepoint is
> disabled.
> 
> By doing that, when multiple tracers are attached to a kvm tracepoint, the
> translation from pointer-to-internal-structure to meaningful fields would only
> need to be done once when a tracepoint is hit. And this would remove the need
> for using kvm_x86_ops callbacks from tracer probe functions.
> 
> Thoughts ?

Hi Paolo,

We are at 5.18-rc4 now. Should I expect this unexport to stay in place
for 5.18 final and go ahead with using kallsyms to find this symbol from
lttng-modules instead ?

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com