Just to clarify more what I meant by custom events; I have new tracepoints added in the source code of the benchmark. However, I don't enable the corresponding events when I do the actual tracing. This is the sequence followed in building the benchmark: gcc-7.2 -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -DLTTNG_INST lu.c gcc-7.2 -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -g -fno-strict-aliasing -DLTTNG_INST -o LU_NCB lu.o ../../instrumentation/lttng_tp/tp.o -lm -llttng-ust -ldl LTTNG_INST is just a preprocessor flag I have and tp.o is my custom tracepoints Shehab Y. Elsayed, MSc. PhD Student The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering University of Toronto E-mail: shehabyomn@gmail.com On Wed, Mar 21, 2018 at 7:55 PM, Shehab Elsayed wrote: > Still running into same problem. I attached the debug trace I got after > applying the 2 patches. > > The benchmark I am running also includes some custom created tracepoints. > I am not adding the events being traced in the files I have provided. Do > you think this might be causing a problem when I have tracpoints from 2 > different packages? > > Shehab Y. Elsayed, MSc. > PhD Student > The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering > University of Toronto > E-mail: shehabyomn@gmail.com > > > On Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: > >> ----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed >> wrote: >> >> I am so sorry for the late replies. >> >> I have tried the first patch you sent and the problem still happens >> (although seemingly less frequently especially with debugging enabled!!!!). >> I have attached the output I got from one of the erroneous runs. >> >> I will try the updated patch and let you know how it goes. >> >> Also, I am not sure if these points make any difference: >> 1- The error actually happens at the end of the application. It actually >> finishes execution, but then something goes wrong. >> 2- I run into this problem only for some of the benchmarks (or at least >> the problems happens much less frequently for others that I didn't run into >> it, not yet) >> >> Thanks you very much, and sorry again for the late replies. >> >> >> No worries! Looking through your log, I notice that pthread set cancel >> state has problems when >> called from application threads. We do not restore the original thread's >> cancel state. Can you try >> with the attached patch applied on top of the previous one ? >> >> Thanks, >> >> Mathieu >> >> >> >> >> Shehab Y. Elsayed, MSc. >> PhD Student >> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >> University of Toronto >> E-mail: shehabyomn@gmail.com >> >> >> On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers < >> mathieu.desnoyers@efficios.com> wrote: >> >>> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers < >>> mathieu.desnoyers@efficios.com> wrote: >>> >>> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers < >>> mathieu.desnoyers@efficios.com> wrote: >>> >>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers < >>> mathieu.desnoyers@efficios.com> wrote: >>> >>> >>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed >>> wrote: >>> >>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid" and made sure >>> the value was actually written by "cat /proc/sys/kernel/perf_event_pa >>> ranoid" >>> >>> It executed normally 2 times but then got the same error. >>> >>> >>> Can you provide the output when reproducing the issue with the >>> LTTNG_UST_DEBUG=1 environment variable set when starting >>> your test program ? >>> >>> I just noticed something that might explain what goes wrong here: >>> >>> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). >>> Pthread mutex >>> in your case is instrumented with the pthread wrapper. This >>> "add_thread_field" is invoked >>> the first time the perf counter is hit by each given thread. When this >>> happens, the >>> instrumented pthread mutex will try to call into the instrumentation >>> tracepoint again, >>> which will call add_thread_field() (again), and so on until we reach the >>> libringbuffer >>> "lib_ring_buffer_nesting" threshold of 4 calls deep. >>> >>> I suspect this situation where we recursively call add_thread_field is >>> unexpected, >>> which may trigger your double-free here. >>> >>> Will investigate more. >>> >>> Can you try with the attached patch applied ? >>> >>> Here is an updated v2 of the patch which tests the notrace tls counter >>> sooner (before evaluating >>> filter). Please let me know how it goes. >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> >>> Shehab Y. Elsayed, MSc. >>> PhD Student >>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>> University of Toronto >>> E-mail: shehabyomn@gmail.com >>> >>> >>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < >>> mathieu.desnoyers@efficios.com> wrote: >>> >>>> >>>> >>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed >>>> wrote: >>>> >>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1 >>>> >>>> I run the program as a normal user >>>> >>>> The glibc version I get by running "ldd --version" is 2.17 >>>> >>>> >>>> Can you reproduce the issue after you do this as root ? >>>> >>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid >>>> >>>> Based on this documentation of the Linux kernel: >>>> >>>> Documentation/sysctl/kernel.txt: >>>> >>>> perf_event_paranoid: >>>> >>>> Controls use of the performance events system by unprivileged >>>> users (without CAP_SYS_ADMIN). The default value is 2. >>>> >>>> -1: Allow use of (almost) all events by all users >>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN >>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN >>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> Shehab Y. Elsayed, MSc. >>>> PhD Student >>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>> University of Toronto >>>> E-mail: shehabyomn@gmail.com >>>> >>>> >>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < >>>> mathieu.desnoyers@efficios.com> wrote: >>>> >>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < >>>>> mathieu.desnoyers@efficios.com> wrote: >>>>> >>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < >>>>> shehabyomn@gmail.com> wrote: >>>>> >>>>> Yes, I tried with only one of those contexts and I still ran into the >>>>> same problem. >>>>> >>>>> What is the setting returned by >>>>> >>>>> cat /proc/sys/kernel/perf_event_paranoid >>>>> >>>>> on your system ? And do you run your test program as root or normal >>>>> user ? >>>>> >>>>> Please CC the mailing list on your reply. >>>>> >>>>> >>>>> I will also need to know what glibc version you have on your system. >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> Shehab Y. Elsayed, MSc. >>>>> PhD Student >>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>> University of Toronto >>>>> E-mail: shehabyomn@gmail.com >>>>> >>>>> >>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < >>>>> mathieu.desnoyers@efficios.com> wrote: >>>>> >>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < >>>>>> shehabyomn@gmail.com> wrote: >>>>>> >>>>>> Hi Mathieu, >>>>>> >>>>>> Thank you very much for your reply. >>>>>> >>>>>> I manually built lttng-ust from source (commit #: >>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a). >>>>>> >>>>>> This is how I set up the tracing session: >>>>>> 1- lttng create lu_ncb_8_native -o {path} >>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_mute >>>>>> x_lock_req >>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mute >>>>>> x_lock_acq >>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mute >>>>>> x_lock_trylock >>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mute >>>>>> x_lock_unlock >>>>>> 3- lttng add-context -u -t procname >>>>>> lttng add-context -u -t vpid >>>>>> lttng add-context -u -t pthread_id >>>>>> lttng add-context -u -t vtid >>>>>> lttng add-context -u -t ip >>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>> lttng add-context -u -t perf:thread:cycles >>>>>> lttng add-context -u -t perf:thread:instructions >>>>>> 4- lttng start >>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so >>>>>> ./lu_ncb -p8 -n8096 -b32 >>>>>> 6- lttng stop >>>>>> 7- lttng destroy >>>>>> >>>>>> >>>>>> Can you reproduce if you remove the following contexts ? >>>>>> >>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>> lttng add-context -u -t perf:thread:cycles >>>>>> lttng add-context -u -t perf:thread:instructions >>>>>> >>>>>> And if you only keep a single of those contexts ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mathieu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Shehab Y. Elsayed, MSc. >>>>>> PhD Student >>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>>> University of Toronto >>>>>> E-mail: shehabyomn@gmail.com >>>>>> >>>>>> >>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < >>>>>> mathieu.desnoyers@efficios.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < >>>>>>> shehabyomn@gmail.com> wrote: >>>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> I am trying to instrument a pthread application using the provided >>>>>>> pthread wrapper, but I sometimes run into a "Double free or >>>>>>> corruption error (fasttop)" error. >>>>>>> >>>>>>> >>>>>>> Please provide more information about the version of lttng-ust you >>>>>>> are using, and how you setup >>>>>>> your tracing session. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mathieu >>>>>>> >>>>>>> >>>>>>> >>>>>>> Here is a description of what I have tried and noticed: >>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes >>>>>>> works as expected. >>>>>>> 2- From my experiments, the problem happens (more frequently at >>>>>>> least) when adding performance counter contexts (I tried cycles, cpu_cycles >>>>>>> and instructions). >>>>>>> 3- I am testing using lu_ncb from splash3 benchmark suite after >>>>>>> setting LD_PRELOAD to use the pthread wrapper as described in the >>>>>>> LTTng documents. >>>>>>> 4- Here is the backtrace printed after exiting: >>>>>>> ======= Backtrace: ========= >>>>>>> /lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+0x35) >>>>>>> [0x7ffff7471575] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+0x21c >>>>>>> )[0x7ffff747363c] >>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+0x9f)[ >>>>>>> 0x7ffff746dccf] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+0x9f)[ >>>>>>> 0x7ffff746dccf] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+0x9f)[ >>>>>>> 0x7ffff746dccf] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[0x >>>>>>> 7ffff746ead8] >>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3] >>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[0x7fff >>>>>>> f745ed57] >>>>>>> /lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a] >>>>>>> /lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49] >>>>>>> /lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95] >>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51] >>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35] >>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44] >>>>>>> 5- Also, this is a backtrace I obtained from gdb: >>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6 >>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6 >>>>>>> #2 0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6 >>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6 >>>>>>> #4 0x00007ffff768ad25 in lttng_destroy_perf_counter_field ( >>>>>>> field=) at lttng-context-perf-counters.c:418 >>>>>>> #5 0x00007ffff767a575 in lttng_destroy_context ( >>>>>>> ctx=0x7ffff0011090) at lttng-context.c:278 >>>>>>> #6 0x00007ffff767c63c in _lttng_channel_unmap ( >>>>>>> lttng_chan=0x7ffff0010f40) at lttng-events.c:172 >>>>>>> #7 lttng_session_destroy (session=0x7ffff0000900) >>>>>>> at lttng-events.c:247 >>>>>>> #8 0x00007ffff7676906 in lttng_release_session ( >>>>>>> objd=) at lttng-ust-abi.c:601 >>>>>>> #9 0x00007ffff7676ccf in lttng_ust_objd_unref (id=1, >>>>>>> is_owner=) at lttng-ust-abi.c:216 >>>>>>> #10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2, >>>>>>> is_owner=) at lttng-ust-abi.c:216 >>>>>>> #11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id@entry=18, >>>>>>> is_owner=is_owner@entry=1) at lttng-ust-abi.c:216 >>>>>>> #12 0x00007ffff7677ad8 in objd_table_destroy () >>>>>>> at lttng-ust-abi.c:235 >>>>>>> #13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002 >>>>>>> #14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1) >>>>>>> at lttng-ust-comm.c:1807 >>>>>>> #15 0x00007ffff7667d57 in lttng_ust_exit () >>>>>>> at lttng-ust-comm.c:1874 >>>>>>> #16 0x00007ffff7dec85a in _dl_fini () >>>>>>> from /lib64/ld-linux-x86-64.so.2 >>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers () >>>>>>> from /lib64/libc.so.6 >>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6 >>>>>>> #19 0x0000000000401b51 in main (argc=, >>>>>>> argv=) at lu.c:368 >>>>>>> >>>>>>> Any ideas, why this happens and how to fix it? >>>>>>> >>>>>>> Thanks, >>>>>>> Shehab >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> lttng-dev mailing list >>>>>>> lttng-dev@lists.lttng.org >>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mathieu Desnoyers >>>>>>> EfficiOS Inc. >>>>>>> http://www.efficios.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> http://www.efficios.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> > >