lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* [lttng-dev] what will happen during tracing if getcpu() doesn't return correct cpu number occasionally?
@ 2021-07-07  2:37 zhenyu.ren via lttng-dev
  2021-07-07 13:46 ` Mathieu Desnoyers via lttng-dev
  0 siblings, 1 reply; 2+ messages in thread
From: zhenyu.ren via lttng-dev @ 2021-07-07  2:37 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 795 bytes --]

Hi, 
   I know lttng-ust tracepoint process uses per cpu lockless ringbuffer algorithm for high performance so that it relies on getcpu() to return the cpu number on which the app is running.
   Unfortunately, I am working on arm that linux kernel does not support vdso getcpu() implemention and one getcpu() will take 200ns!!!
   My question is :
   1. do you have any advice for that?
   2. If I implement a cache-version for getcpu()(just like getcpu() implemention before kernel 2.6.23 ), what will happen during tracing process?  
      Since use of the cache could speed getcpu() calls, at the cost that there was a very small chance that the returned cpu number would be out of date, I am not sure whether the "wrong" cpu number will result in the tracing app crashing?

Thanks
zhenyu.ren

[-- Attachment #1.2: Type: text/html, Size: 2331 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [lttng-dev] what will happen during tracing if getcpu() doesn't return correct cpu number occasionally?
  2021-07-07  2:37 [lttng-dev] what will happen during tracing if getcpu() doesn't return correct cpu number occasionally? zhenyu.ren via lttng-dev
@ 2021-07-07 13:46 ` Mathieu Desnoyers via lttng-dev
  0 siblings, 0 replies; 2+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2021-07-07 13:46 UTC (permalink / raw)
  To: zhenyu.ren; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 2441 bytes --]

----- On Jul 6, 2021, at 10:37 PM, lttng-dev <lttng-dev@lists.lttng.org> wrote: 

> Hi,
> I know lttng-ust tracepoint process uses per cpu lockless ringbuffer algorithm
> for high performance so that it relies on getcpu() to return the cpu number on
> which the app is running.
> Unfortunately, I am working on arm that linux kernel does not support vdso
> getcpu() implemention and one getcpu() will take 200ns!!!
> My question is :
> 1. do you have any advice for that?

You might want to try wiring up the "rseq" system call in user-space to provide an accurate cpu_id 
field in a __rseq_abi TLS variable. It is always kept up to date by the kernel. The rseq system call 
is implemented on ARM. However the __rseq_abi TLS is a shared resource across libraries, and 
we have not agreed with glibc people on how exactly it must be shared within a process. 

> 2. If I implement a cache-version for getcpu()(just like getcpu() implemention
> before kernel 2.6.23 ), what will happen during tracing process?

You'd have to give more details on how this "cache-version" works. 

> Since use of the cache could speed getcpu() calls, at the cost that there was a
> very small chance that the returned cpu number would be out of date, I am not
> sure whether the "wrong" cpu number will result in the tracing app crashing?

LTTng-UST always has to expect that it can be migrated at any point between getcpu and writes to 
per-cpu data. Therefore, it always relies on atomic operations when interacting with the ring buffer, 
and there is no expectation that it runs on the "right" CPU compared to the ring buffer data structure 
for consistency. Therefore, you won't experience crashes nor corruption even if the CPU number is 
wrong once in a while, as long as it belongs to the "possible CPUs". 

This behavior is selected by lttng's libringbuffer "RING_BUFFER_SYNC_GLOBAL" configuration 
option internally, as selected by lttng-ust. 

Note that the kernel tracer instead selects "RING_BUFFER_SYNC_PER_CPU", which is faster, but 
requires that preemption (or migration) be disabled between the "smp_processor_id()" and writes to 
the ring buffer per-cpu data structures. 

Thanks, 

Mathieu 

> Thanks
> zhenyu.ren

> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 

[-- Attachment #1.2: Type: text/html, Size: 5958 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-07 13:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07  2:37 [lttng-dev] what will happen during tracing if getcpu() doesn't return correct cpu number occasionally? zhenyu.ren via lttng-dev
2021-07-07 13:46 ` Mathieu Desnoyers via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).