Re: Segfault at v_read() called from lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app

* Re: Segfault at v_read() called from lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware dependent
       [not found] <20998D40D9A2B7499CA5A3A2666CB1EB2D9DDA59@ZURMSG1.QUANTUM.com>
@ 2014-12-11 15:36 ` Mathieu Desnoyers
       [not found] ` <1745838195.26177.1418312190246.JavaMail.zimbra@efficios.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2014-12-11 15:36 UTC (permalink / raw)
  To: David OShea; +Cc: lttng-dev

[-- Attachment #1.1: Type: text/plain, Size: 4749 bytes --]

----- Original Message -----

> From: "David OShea" <David.OShea@quantum.com>
> To: "lttng-dev" <lttng-dev@lists.lttng.org>
> Sent: Sunday, December 7, 2014 10:30:04 PM
> Subject: [lttng-dev] Segfault at v_read() called from
> lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware
> dependent

> Hi all,

> We have encountered a problem with using LTTng-UST tracing with our
> application, where on a particular VMware vCenter cluster we almost ways get
> segfaults when tracepoints are enabled, whereas on another vCenter cluster,
> and on every other machine we’ve ever used, we don’t hit this problem.

> I can reproduce this using lttng-ust/tests/hello after using:

> """

> lttng create

> lttng enable-channel channel0 --userspace

> lttng add-context --userspace -t vpid -t vtid -t procname

> lttng enable-event --userspace "ust_tests_hello:*" -c channel0

> lttng start

> """

> In which case I get the following stack trace with an obvious NULL pointer
> dereference:

> """

> Program terminated with signal SIGSEGV, Segmentation fault.

> #0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48

> 48 return uatomic_read(&v_a->a);

> [...]

> #0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48

> #1 0x00007f4aa10a4804 in lib_ring_buffer_try_reserve_slow (

> buf=0x7f4a98008a00, chan=0x7f4a98008a00, offsets=0x7fffef67c620,

> ctx=0x7fffef67ca40) at ring_buffer_frontend.c:1677

> #2 0x00007f4aa10a6c9f in lib_ring_buffer_reserve_slow (ctx=0x7fffef67ca40)

> at ring_buffer_frontend.c:1819

> #3 0x00007f4aa1095b75 in lib_ring_buffer_reserve (ctx=0x7fffef67ca40,

> config=0x7f4aa12b8ae0 <client_config>)

> at ../libringbuffer/frontend_api.h:211

> #4 lttng_event_reserve (ctx=0x7fffef67ca40, event_id=0)

> at lttng-ring-buffer-client.h:473

> #5 0x000000000040135f in __event_probe__ust_tests_hello___tptest (

> __tp_data=0xed3410, anint=0, netint=0, values=0x7fffef67cb50,

> text=0x7fffef67cb70 "test", textlen=<optimized out>, doublearg=2,

> floatarg=2222, boolarg=true) at ././ust_tests_hello.h:32

> #6 0x0000000000400d2c in __tracepoint_cb_ust_tests_hello___tptest (

> boolarg=true, floatarg=2222, doublearg=2, textlen=4,

> text=0x7fffef67cb70 "test", values=0x7fffef67cb50,

> netint=<optimized out>, anint=0) at ust_tests_hello.h:32

> #7 main (argc=<optimized out>, argv=<optimized out>) at hello.c:92

> """

> I hit this segfault 10 out of 10 times I ran “hello” on a VM on one vCenter
> and 0 out of 10 times I ran it on the other, and the VMs otherwise had the
> same software installed on them:

> - CentOS 6-based

> - kernel-2.6.32-504.1.3.el6 with some minor changes made in networking

> - userspace-rcu-0.8.3, lttng-ust-2.3.2 and lttng-tools-2.3.2 which might have
> some minor patches backported, and leftovers of changes to get them to build
> on CentOS 5

> On the “good” vCenter, I tested on two different VM hosts:

> Processor Type: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz

> EVC Mode: Intel(R) "Nehalem" Generation

> Image Profile: (Updated) ESXi-5.1.0-799733-standard

> Processor Type: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz

> EVC Mode: Intel(R) "Nehalem" Generation

> Image Profile: (Updated) ESXi-5.1.0-799733-standard

> The “bad” vCenter VM host that I tested on had this configuration:

> ESX Version: VMware ESXi, 5.0.0, 469512

> Processor Type: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz

> Any ideas?

My bet would be that the OS is lying to userspace about the 
number of possible CPUs. I wonder what liblttng-ust 
libringbuffer/shm.h num_possible_cpus() is returning compared 
to what lib_ring_buffer_get_cpu() returns. 

Can you check this out ? 

Thanks, 

Mathieu 

> Thanks in advance,
> David

> The information contained in this transmission may be confidential. Any
> disclosure, copying, or further distribution of confidential information is
> not permitted unless such privilege is explicitly granted in writing by
> Quantum. Quantum reserves the right to have electronic communications,
> including email and attachments, sent across its networks filtered through
> anti virus and spam software programs and retain such messages in order to
> comply with applicable data security and retention requirements. Quantum is
> not responsible for the proper and complete transmission of the substance of
> this communication or for any delay in its receipt.

> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 

[-- Attachment #1.2: Type: text/html, Size: 8339 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread