----- On Mar 22, 2018, at 12:24 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:Actually, i am not sure if this would help. I have been trying to reproduce the problem on a different machine, but I can't. Even without the patches !!!!!Does it have the same glibc version ?Thanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Wed, Mar 21, 2018 at 8:13 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 21, 2018, at 8:01 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:LTTNG_INST is just a preprocessor flag I have and tp.o is my custom tracepointsJust to clarify more what I meant by custom events; I have new tracepoints added in the source code of the benchmark. However, I don't enable the corresponding events when I do the actual tracing.This is the sequence followed in building the benchmark:
gcc-7.2 -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -DLTTNG_INST lu.c
gcc-7.2 -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -DLTTNG_INST -o LU_NCB lu.o ../../instrumentation/lttng_tp/tp.o -lm -llttng-ust -ldl Could you share a repository with your custom instrumentation on github, so I couldtry it out ?My current problem is that I cannot reproduce your issue on my end.Thanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Wed, Mar 21, 2018 at 7:55 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:Still running into same problem. I attached the debug trace I got after applying the 2 patches.The benchmark I am running also includes some custom created tracepoints. I am not adding the events being traced in the files I have provided. Do you think this might be causing a problem when I have tracpoints from 2 different packages?Shehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:Thanks you very much, and sorry again for the late replies.2- I run into this problem only for some of the benchmarks (or at least the problems happens much less frequently for others that I didn't run into it, not yet)1- The error actually happens at the end of the application. It actually finishes execution, but then something goes wrong.Also, I am not sure if these points make any difference:I will try the updated patch and let you know how it goes.I am so sorry for the late replies.I have tried the first patch you sent and the problem still happens (although seemingly less frequently especially with debugging enabled!!!!). I have attached the output I got from one of the erroneous runs.No worries! Looking through your log, I notice that pthread set cancel state has problems whencalled from application threads. We do not restore the original thread's cancel state. Can you trywith the attached patch applied on top of the previous one ?Thanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid " and made sure the value was actually written by "cat /proc/sys/kernel/perf_event_paranoid" It executed normally 2 times but then got the same error.Can you provide the output when reproducing the issue with theLTTNG_UST_DEBUG=1 environment variable set when startingyour test program ?I just noticed something that might explain what goes wrong here:lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). Pthread mutexin your case is instrumented with the pthread wrapper. This "add_thread_field" is invokedthe first time the perf counter is hit by each given thread. When this happens, theinstrumented pthread mutex will try to call into the instrumentation tracepoint again,which will call add_thread_field() (again), and so on until we reach the libringbuffer"lib_ring_buffer_nesting" threshold of 4 calls deep.I suspect this situation where we recursively call add_thread_field is unexpected,which may trigger your double-free here.Will investigate more.Can you try with the attached patch applied ?Here is an updated v2 of the patch which tests the notrace tls counter sooner (before evaluatingfilter). Please let me know how it goes.Thanks,MathieuThanks,MathieuThanks,MathieuThanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:
----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:The glibc version I get by running "ldd --version" is 2.17cat /proc/sys/kernel/perf_event_I run the program as a normal userparanoid ---> returns 1Can you reproduce the issue after you do this as root ?echo "-1" > /proc/sys/kernel/perf_event_paranoid Based on this documentation of the Linux kernel:Documentation/sysctl/kernel.txt: perf_event_paranoid:
Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN). The default value is 2.
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMINThanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:Yes, I tried with only one of those contexts and I still ran into the same problem.What is the setting returned bycat /proc/sys/kernel/perf_event_paranoid on your system ? And do you run your test program as root or normal user ?Please CC the mailing list on your reply.I will also need to know what glibc version you have on your system.Thanks,MathieuThanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:7- lttng destroy6- lttng stop4- lttng start3- lttng add-context -u -t procname2- lttng enable-event --userspace lttng_ust_pthread:pthread_1- lttng create lu_ncb_8_native -o {path}Hi Mathieu,This is how I set up the tracing session:
Thank you very much for your reply.
I manually built lttng-ust from source (commit #: 8a208943e21700211beee3ea64180a5a534c7d2a). mutex_lock_req
lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_acq
lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_trylock
lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_unlock
lttng add-context -u -t vpid
lttng add-context -u -t pthread_id
lttng add-context -u -t vtid
lttng add-context -u -t ip
lttng add-context -u -t perf:thread:cpu-cycles
lttng add-context -u -t perf:thread:cycles
lttng add-context -u -t perf:thread:instructions
5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper. so ./lu_ncb -p8 -n8096 -b32 Can you reproduce if you remove the following contexts ?lttng add-context -u -t perf:thread:cpu-cycles
lttng add-context -u -t perf:thread:cycles
lttng add-context -u -t perf:thread:instructionsAnd if you only keep a single of those contexts ?Thanks,MathieuShehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn@gmail.comOn Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com > wrote:----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed <shehabyomn@gmail.com> wrote:Hello All,I am trying to instrument a pthread application using the provided pthread wrapper, but I sometimes run into a "Double free or corruption error (fasttop)" error.Please provide more information about the version of lttng-ust you are using, and how you setupyour tracing session.Thanks,Mathieu______________________________ShehabThanks,Any ideas, why this happens and how to fix it?5- Also, this is a backtrace I obtained from gdb:4- Here is the backtrace printed after exiting:3- I am testing using lu_ncb from splash3 benchmark suite after setting LD_PRELOAD to use the pthread wrapper as described in the LTTng documents.2- From my experiments, the problem happens (more frequently at least) when adding performance counter contexts (I tried cycles, cpu_cycles and instructions).1- The problem isn't consistent. It sometimes happen and sometimes works as expected.Here is a description of what I have tried and noticed:
======= Backtrace: =========
/lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited]
/usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+ 0x35)[0x7ffff7471575]
/usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+ 0x21c)[0x7ffff747363c]
/usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906]
/usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ 0x9f)[0x7ffff746dccf]
/usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ 0x9f)[0x7ffff746dccf]
/usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ 0x9f)[0x7ffff746dccf]
/usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[ 0x7ffff746ead8]
/usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3]
/usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[ 0x7ffff745ed57]
/lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a]
/lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49]
/lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95]
/aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35]
/aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44]
#0 0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6
#2 0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6
#4 0x00007ffff768ad25 in lttng_destroy_perf_counter_field (
field=<optimized out>) at lttng-context-perf-counters.c:418
#5 0x00007ffff767a575 in lttng_destroy_context (
ctx=0x7ffff0011090) at lttng-context.c:278
#6 0x00007ffff767c63c in _lttng_channel_unmap (
lttng_chan=0x7ffff0010f40) at lttng-events.c:172
#7 lttng_session_destroy (session=0x7ffff0000900)
at lttng-events.c:247
#8 0x00007ffff7676906 in lttng_release_session (
objd=<optimized out>) at lttng-ust-abi.c:601
#9 0x00007ffff7676ccf in lttng_ust_objd_unref (id=1,
is_owner=<optimized out>) at lttng-ust-abi.c:216
#10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2,
is_owner=<optimized out>) at lttng-ust-abi.c:216
#11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id@entry=18,
is_owner=is_owner@entry=1) at lttng-ust-abi.c:216
#12 0x00007ffff7677ad8 in objd_table_destroy ()
at lttng-ust-abi.c:235
#13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002
#14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1)
at lttng-ust-comm.c:1807
#15 0x00007ffff7667d57 in lttng_ust_exit ()
at lttng-ust-comm.c:1874
#16 0x00007ffff7dec85a in _dl_fini ()
from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff6eafa49 in __run_exit_handlers ()
from /lib64/libc.so.6
#18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6
#19 0x0000000000401b51 in main (argc=<optimized out>,
argv=<optimized out>) at lu.c:368_________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ----------