* [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data
@ 2021-12-07 9:03 zhenyu.ren via lttng-dev
2021-12-07 20:07 ` Mathieu Desnoyers via lttng-dev
0 siblings, 1 reply; 5+ messages in thread
From: zhenyu.ren via lttng-dev @ 2021-12-07 9:03 UTC (permalink / raw)
To: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 6724 bytes --]
Hi, Lttng-dev:
I found a strange problem related to lttng-consumed and ctf files recently. The ctf files belongs to some CPUs have been stopped rotating but other ctf files(belong to other CPUs) keeped working as usual. I am very sure all CPUs were producing spans all the time.
#date; ls -ltrh channel0_0_* |tail -n 3; date;ls -ltrh channel0_1_* |tail -n 3
Tue Dec 7 16:25:45 CST 2021
-rw-rw---- 1 root root 1.8M Dec 7 16:20 channel0_0_17
-rw-rw---- 1 root root 2.0M Dec 7 16:23 channel0_0_18
-rw-rw---- 1 root root 916K Dec 7 16:24 channel0_0_19
Tue Dec 7 16:25:45 CST 2021
-rw-rw---- 1 root root 1.7M Dec 6 00:30 channel0_1_8
-rw-rw---- 1 root root 1.9M Dec 6 00:31 channel0_1_9
-rw-rw---- 1 root root 388K Dec 6 00:32 channel0_1_10
Notice that the ctf files with CPU0 (channel0_0_19) was modified at the time "now", but the ctf files with CPU1(channeo0_1_10) has been stopped working at Dec 6.
I gdb lttng-consumerd(break at lib_ring_buffer_channel_do_read() and lib_ring_buffer_poll_deliver()--I configured a read timer on channels). See the followings (lib_ring_buffer_poll_deliver() )
0x00007f0f2418a445 <+213>: mov 0x0(%r13),%rcx rcx:0x7f0f0402e610(handle->table)
0x00007f0f2418a449 <+217>: mov 0x98(%rsi),%r9 rsi:0x7f0efb86f000(buf) r9:0x86c400000(consumed_old = buf->consumed)
0x00007f0f2418a450 <+224>: lea 0x150(%rsi),%rdi rdi:0x7f0efda75150
0x00007f0f2418a457 <+231>: mov 0x150(%rsi),%rax rax:3(cpu?)
0x00007f0f2418a45e <+238>: mov 0x78(%rbp),%rdx rbp:chan rdx:0x1000000(chan->backend.buf_size)
0x00007f0f2418a462 <+242>: mov 0x88(%rbp),%r8d r8d:0x14(20)(chan->backend.buf_size_order)
0x00007f0f2418a469 <+249>: cmp %rax,0x8(%rcx)
0x00007f0f2418a46d <+253>: jbe 0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180>
0x00007f0f2418a473 <+259>: shl $0x6,%rax rax:192
0x00007f0f2418a477 <+263>: sub $0x1,%rdx rdx:0xffffff 16777215 (chan->backend.buf_size - 1)
0x00007f0f2418a47b <+267>: lea 0x10(%rax,%rcx,1),%r10 rax:192 rcx:0x7f0f0402e610(handle->table) r10:0x7f0f0402e6e0
0x00007f0f2418a480 <+272>: and %r9,%rdx r9:0x8fdc00000 rdx:0xc00000 12582912 buf_trunc(consume_old, chan->backend.buf_size - 1)
0x00007f0f2418a483 <+275>: mov 0x8(%rdi),%rax rdi:0x7f0efda75150 rax:2688 ref_offset = (size_t) ref->offset
0x00007f0f2418a487 <+279>: mov %r8d,%ecx ecx:0x14(20)
0x00007f0f2418a48a <+282>: shr %cl,%rdx (....)
0x00007f0f2418a48d <+285>: shl $0x7,%rdx
0x00007f0f2418a491 <+289>: add %rax,%rdx
0x00007f0f2418a494 <+292>: lea 0x80(%rdx),%rax
0x00007f0f2418a49b <+299>: cmp 0x28(%r10),%rax
0x00007f0f2418a49f <+303>: ja 0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180>
0x00007f0f2418a4a5 <+309>: mov 0x20(%r10),%rax r10:0x7f0f0402e6e0 rax:0x7f0efb86f000(buf)
0x00007f0f2418a4a9 <+313>: add %rdx,%rax rdx:3200 rax:0x7f0efb86fc80
0x00007f0f2418a4ac <+316>: mov (%rax),%rax rax:0x86c00000(commit_count!!!!!!Incremented _once_ at sb switch cc_sb)
0x00007f0f2418a4af <+319>: mov 0x80(%rbp),%r8 r8:0x100000(chan->backend.subbuf_size)
0x00007f0f2418a4b6 <+326>: mov 0x78(%rbp),%rdx rdx:0x1000000(chan->backend.buf_size)
0x00007f0f2418a4ba <+330>: mov 0x8c(%rbp),%ecx ecx:4
0x00007f0f2418a4c0 <+336>: mov 0x80(%rsi),%rdi rdi:0x86d400000
0x00007f0f2418a4c7 <+343>: sub %r8,%rax rax:0x86b00000(commit_count - chan->backend.subbuf_size)
0x00007f0f2418a4ca <+346>: and 0x8(%rbp),%rax rax & chan->commit_count_mask = rax:0x86b00000
0x00007f0f2418a4ce <+350>: neg %rdx rdx:0x1000000(16M chan->backend.buf_size) --> 0xffffffffff000000(-16777216)
0x00007f0f2418a4d1 <+353>: and %r9,%rdx r9:0x86c400000(consume_old) rdx:0x86c000000
0x00007f0f2418a4d4 <+356>: shr %cl,%rdx cl:4 rdx:0x86c00000
0x00007f0f2418a4d7 <+359>: cmp %rdx,%rax rax:0x86b0000 rdx:0x86c00000
0x00007f0f2418a4da <+362>: jne 0x7f0f2418a411 <lib_ring_buffer_channel_do_read+161>
0x00007f0f2418a4e0 <+368>: mov %r8,%rax
209 static inline
210 int lib_ring_buffer_poll_deliver(const struct lttng_ust_lib_ring_buffer_config *config,
211 struct lttng_ust_lib_ring_buffer *buf,
212 struct channel *chan,
213 struct lttng_ust_shm_handle *handle)
214 {
215 unsigned long consumed_old, consumed_idx, commit_count, write_offset;
216
217 consumed_old = uatomic_read(&buf->consumed);
218 consumed_idx = subbuf_index(consumed_old, chan);
219 commit_count = v_read(config, &shmp_index(handle, buf->commit_cold, consumed_idx)->cc_sb);
220 /*
221 * No memory barrier here, since we are only interested
222 * in a statistically correct polling result. The next poll will
223 * get the data is we are racing. The mb() that ensures correct
224 * memory order is in get_subbuf.
225 */
226 write_offset = v_read(config, &buf->offset);
227
228 /*
229 * Check that the subbuffer we are trying to consume has been
230 * already fully committed.
231 */
232
233 if (((commit_count - chan->backend.subbuf_size)
234 & chan->commit_count_mask)
235 - (buf_trunc(consumed_old, chan)
236 >> chan->backend.num_subbuf_order)
237 != 0)
238 return 0;
239
240 /*
241 * Check that we are not about to read the same subbuffer in
242 * which the writer head is.
243 */
244 if (subbuf_trunc(write_offset, chan) - subbuf_trunc(consumed_old, chan)
245 == 0)
246 return 0;
247
248 return 1;
249
250 }
It seems that lib_ring_buffer_channel_do_read()->lib_ring_buffer_poll_deliver() returned 0 at line 238 ($consume_old:0x86c400000, $commint_count:0x86c00000, $write_offset:0x86d400000).
Buffer type: per UID
Channels:
-------------
- channel0: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 1048576
number of subbufers: 16
switch timer interval: 20000000
read timer interval: 60000000
trace file count: 80
trace file size (bytes): 2097152
output: mmap()
PS: Lttng version is 2.7(I know it is an very old version:) )
Do you have any ideas about this problem? Thanks in advance
Thanks a lot
zhenyu.ren
[-- Attachment #1.2: Type: text/html, Size: 19042 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data
2021-12-07 9:03 [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data zhenyu.ren via lttng-dev
@ 2021-12-07 20:07 ` Mathieu Desnoyers via lttng-dev
2021-12-08 1:47 ` [lttng-dev] 回复: " zhenyu.ren via lttng-dev
0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2021-12-07 20:07 UTC (permalink / raw)
To: zhenyu.ren; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 7006 bytes --]
Hi,
Can you try:
- Reproducing with a more recent LTTng-UST/LTTng-Tools ? (e.g. 2.13). LTTng 2.7 is not supported anymore.
- Try to reproduce with "per-pid" UST buffers rather than "per-uid",
- Try to reproduce without the "tracefile rotation" mode (without --tracefile-count and --tracefile-size options to the channel).
Also, do you happen to have traced applications which are frequently killed asynchronously
while some of their threads are actively tracing by any chance ?
Thanks,
Mathieu
----- On Dec 7, 2021, at 4:03 AM, lttng-dev <lttng-dev@lists.lttng.org> wrote:
> Hi, Lttng-dev:
> I found a strange problem related to lttng-consumed and ctf files recently. The
> ctf files belongs to some CPUs have been stopped rotating but other ctf
> files(belong to other CPUs) keeped working as usual. I am very sure all CPUs
> were producing spans all the time.
> #date; ls -ltrh channel0_0_* |tail -n 3; date;ls -ltrh channel0_1_* |tail -n 3
> Tue Dec 7 16:25:45 CST 2021
> -rw-rw---- 1 root root 1.8M Dec 7 16:20 channel0_0_17
> -rw-rw---- 1 root root 2.0M Dec 7 16:23 channel0_0_18
> -rw-rw---- 1 root root 916K Dec 7 16:24 channel0_0_19
> Tue Dec 7 16:25:45 CST 2021
> -rw-rw---- 1 root root 1.7M Dec 6 00:30 channel0_1_8
> -rw-rw---- 1 root root 1.9M Dec 6 00:31 channel0_1_9
> -rw-rw---- 1 root root 388K Dec 6 00:32 channel0_1_10
> Notice that the ctf files with CPU0 (channel0_0_19) was modified at the time
> "now", but the ctf files with CPU1(channeo0_1_10) has been stopped working at
> Dec 6.
> I gdb lttng-consumerd(break at lib_ring_buffer_channel_do_read() and
> lib_ring_buffer_poll_deliver()--I configured a read timer on channels). See the
> followings ( lib_ring_buffer_poll_deliver() )
> 0x00007f0f2418a445 <+213>: mov 0x0(%r13),%rcx rcx:0x7f0f0402e610(handle->table)
> 0x00007f0f2418a449 <+217>: mov 0x98(%rsi),%r9 rsi:0x7f0efb86f000(buf)
> r9:0x86c400000(consumed_old = buf->consumed)
> 0x00007f0f2418a450 <+224>: lea 0x150(%rsi),%rdi rdi:0x7f0efda75150
> 0x00007f0f2418a457 <+231>: mov 0x150(%rsi),%rax rax:3(cpu?)
> 0x00007f0f2418a45e <+238>: mov 0x78(%rbp),%rdx rbp:chan
> rdx:0x1000000(chan->backend.buf_size)
> 0x00007f0f2418a462 <+242>: mov 0x88(%rbp),%r8d
> r8d:0x14(20)(chan->backend.buf_size_order)
> 0x00007f0f2418a469 <+249>: cmp %rax,0x8(%rcx)
> 0x00007f0f2418a46d <+253>: jbe 0x7f0f2418a80c
> <lib_ring_buffer_channel_do_read+1180>
> 0x00007f0f2418a473 <+259>: shl $0x6,%rax rax:192
> 0x00007f0f2418a477 <+263>: sub $0x1,%rdx rdx:0xffffff 16777215
> (chan->backend.buf_size - 1)
> 0x00007f0f2418a47b <+267>: lea 0x10(%rax,%rcx,1),%r10 rax:192
> rcx:0x7f0f0402e610(handle->table) r10:0x7f0f0402e6e0
> 0x00007f0f2418a480 <+272>: and %r9,%rdx r9:0x8fdc00000 rdx:0xc00000 12582912
> buf_trunc(consume_old, chan->backend.buf_size - 1)
> 0x00007f0f2418a483 <+275>: mov 0x8(%rdi),%rax rdi:0x7f0efda75150 rax:2688
> ref_offset = (size_t) ref->offset
> 0x00007f0f2418a487 <+279>: mov %r8d,%ecx ecx:0x14(20)
> 0x00007f0f2418a48a <+282>: shr %cl,%rdx (....)
> 0x00007f0f2418a48d <+285>: shl $0x7,%rdx
> 0x00007f0f2418a491 <+289>: add %rax,%rdx
> 0x00007f0f2418a494 <+292>: lea 0x80(%rdx),%rax
> 0x00007f0f2418a49b <+299>: cmp 0x28(%r10),%rax
> 0x00007f0f2418a49f <+303>: ja 0x7f0f2418a80c
> <lib_ring_buffer_channel_do_read+1180>
> 0x00007f0f2418a4a5 <+309>: mov 0x20(%r10),%rax r10:0x7f0f0402e6e0
> rax:0x7f0efb86f000(buf)
> 0x00007f0f2418a4a9 <+313>: add %rdx,%rax rdx:3200 rax:0x7f0efb86fc80
> 0x00007f0f2418a4ac <+316>: mov (%rax),%rax
> rax:0x86c00000(commit_count!!!!!!Incremented _once_ at sb switch cc_sb)
> 0x00007f0f2418a4af <+319>: mov 0x80(%rbp),%r8
> r8:0x100000(chan->backend.subbuf_size)
> 0x00007f0f2418a4b6 <+326>: mov 0x78(%rbp),%rdx
> rdx:0x1000000(chan->backend.buf_size)
> 0x00007f0f2418a4ba <+330>: mov 0x8c(%rbp),%ecx ecx:4
> 0x00007f0f2418a4c0 <+336>: mov 0x80(%rsi),%rdi rdi:0x86d400000
> 0x00007f0f2418a4c7 <+343>: sub %r8,%rax rax:0x86b00000(commit_count -
> chan->backend.subbuf_size)
> 0x00007f0f2418a4ca <+346>: and 0x8(%rbp),%rax rax & chan->commit_count_mask =
> rax:0x86b00000
> 0x00007f0f2418a4ce <+350>: neg %rdx rdx:0x1000000(16M chan->backend.buf_size)
> --> 0xffffffffff000000(-16777216)
> 0x00007f0f2418a4d1 <+353>: and %r9,%rdx r9:0x86c400000(consume_old)
> rdx:0x86c000000
> 0x00007f0f2418a4d4 <+356>: shr %cl,%rdx cl:4 rdx:0x86c00000
> 0x00007f0f2418a4d7 <+359>: cmp %rdx,%rax rax:0x86b0000 rdx:0x86c00000
> 0x00007f0f2418a4da <+362>: jne 0x7f0f2418a411
> <lib_ring_buffer_channel_do_read+161>
> 0x00007f0f2418a4e0 <+368>: mov %r8,%rax
> 209 static inline
> 210 int lib_ring_buffer_poll_deliver(const struct
> lttng_ust_lib_ring_buffer_config *config,
> 211 struct lttng_ust_lib_ring_buffer *buf,
> 212 struct channel *chan,
> 213 struct lttng_ust_shm_handle *handle)
> 214 {
> 215 unsigned long consumed_old, consumed_idx, commit_count, write_offset;
> 216
> 217 consumed_old = uatomic_read(&buf->consumed);
> 218 consumed_idx = subbuf_index(consumed_old, chan);
> 219 commit_count = v_read(config, &shmp_index(handle, buf->commit_cold,
> consumed_idx)->cc_sb);
> 220 /*
> 221 * No memory barrier here, since we are only interested
> 222 * in a statistically correct polling result. The next poll will
> 223 * get the data is we are racing. The mb() that ensures correct
> 224 * memory order is in get_subbuf.
> 225 */
> 226 write_offset = v_read(config, &buf->offset);
> 227
> 228 /*
> 229 * Check that the subbuffer we are trying to consume has been
> 230 * already fully committed.
> 231 */
> 232
> 233 if (((commit_count - chan->backend.subbuf_size)
> 234 & chan->commit_count_mask)
> 235 - (buf_trunc(consumed_old, chan)
> 236 >> chan->backend.num_subbuf_order)
> 237 != 0)
> 238 return 0;
> 239
> 240 /*
> 241 * Check that we are not about to read the same subbuffer in
> 242 * which the writer head is.
> 243 */
> 244 if (subbuf_trunc(write_offset, chan) - subbuf_trunc(consumed_old, chan)
> 245 == 0)
> 246 return 0;
> 247
> 248 return 1;
> 249
> 250 }
> It seems that lib_ring_buffer_channel_do_read()-> lib_ring_buffer_poll_deliver()
> returned 0 at line 238 ($consume_old:0x86c400000, $commint_count:0x86c00000,
> $write_offset:0x86d400000).
> Buffer type: per UID
> Channels:
> -------------
> - channel0: [enabled]
> Attributes:
> overwrite mode: 0
> subbufers size: 1048576
> number of subbufers: 16
> switch timer interval: 20000000
> read timer interval: 60000000
> trace file count: 80
> trace file size (bytes): 2097152
> output: mmap()
> PS: Lttng version is 2.7(I know it is an very old version:) )
> Do you have any ideas about this problem? Thanks in advance
> Thanks a lot
> zhenyu.ren
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 19013 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lttng-dev] 回复: lttng-consumerd can NOT get notification to consume ring buffer data
2021-12-07 20:07 ` Mathieu Desnoyers via lttng-dev
@ 2021-12-08 1:47 ` zhenyu.ren via lttng-dev
2021-12-09 20:20 ` Mathieu Desnoyers via lttng-dev
0 siblings, 1 reply; 5+ messages in thread
From: zhenyu.ren via lttng-dev @ 2021-12-08 1:47 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 8361 bytes --]
Hi, Desnoyers
For various reasons, none of the three suggestions you mentioned about reproducing the problem can be achieved easily.
I am also wonder why commit counter not updated...It seems that the span producers not complete commiting(be killed?hang?)? It‘s very hard to debug all span producers since they are too many...
So, Can I make lttng-consumer just skip out this subbuffer and try to consume the next subbuffer?
Thanks
zhenyu.ren
------------------------------------------------------------------
发件人:Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
发送时间:2021年12月8日(星期三) 04:17
收件人:zhenyu.ren <zhenyu.ren@aliyun.com>
抄 送:lttng-dev <lttng-dev@lists.lttng.org>
主 题:Re: [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data
Hi,
Can you try:
- Reproducing with a more recent LTTng-UST/LTTng-Tools ? (e.g. 2.13). LTTng 2.7 is not supported anymore.
- Try to reproduce with "per-pid" UST buffers rather than "per-uid",
- Try to reproduce without the "tracefile rotation" mode (without --tracefile-count and --tracefile-size options to the channel).
Also, do you happen to have traced applications which are frequently killed asynchronously
while some of their threads are actively tracing by any chance ?
Thanks,
Mathieu
----- On Dec 7, 2021, at 4:03 AM, lttng-dev <lttng-dev@lists.lttng.org> wrote:
Hi, Lttng-dev:
I found a strange problem related to lttng-consumed and ctf files recently. The ctf files belongs to some CPUs have been stopped rotating but other ctf files(belong to other CPUs) keeped working as usual. I am very sure all CPUs were producing spans all the time.
#date; ls -ltrh channel0_0_* |tail -n 3; date;ls -ltrh channel0_1_* |tail -n 3
Tue Dec 7 16:25:45 CST 2021
-rw-rw---- 1 root root 1.8M Dec 7 16:20 channel0_0_17
-rw-rw---- 1 root root 2.0M Dec 7 16:23 channel0_0_18
-rw-rw---- 1 root root 916K Dec 7 16:24 channel0_0_19
Tue Dec 7 16:25:45 CST 2021
-rw-rw---- 1 root root 1.7M Dec 6 00:30 channel0_1_8
-rw-rw---- 1 root root 1.9M Dec 6 00:31 channel0_1_9
-rw-rw---- 1 root root 388K Dec 6 00:32 channel0_1_10
Notice that the ctf files with CPU0 (channel0_0_19) was modified at the time "now", but the ctf files with CPU1(channeo0_1_10) has been stopped working at Dec 6.
I gdb lttng-consumerd(break at lib_ring_buffer_channel_do_read() and lib_ring_buffer_poll_deliver()--I configured a read timer on channels). See the followings (lib_ring_buffer_poll_deliver() )
0x00007f0f2418a445 <+213>: mov 0x0(%r13),%rcx rcx:0x7f0f0402e610(handle->table)
0x00007f0f2418a449 <+217>: mov 0x98(%rsi),%r9 rsi:0x7f0efb86f000(buf) r9:0x86c400000(consumed_old = buf->consumed)
0x00007f0f2418a450 <+224>: lea 0x150(%rsi),%rdi rdi:0x7f0efda75150
0x00007f0f2418a457 <+231>: mov 0x150(%rsi),%rax rax:3(cpu?)
0x00007f0f2418a45e <+238>: mov 0x78(%rbp),%rdx rbp:chan rdx:0x1000000(chan->backend.buf_size)
0x00007f0f2418a462 <+242>: mov 0x88(%rbp),%r8d r8d:0x14(20)(chan->backend.buf_size_order)
0x00007f0f2418a469 <+249>: cmp %rax,0x8(%rcx)
0x00007f0f2418a46d <+253>: jbe 0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180>
0x00007f0f2418a473 <+259>: shl $0x6,%rax rax:192
0x00007f0f2418a477 <+263>: sub $0x1,%rdx rdx:0xffffff 16777215 (chan->backend.buf_size - 1)
0x00007f0f2418a47b <+267>: lea 0x10(%rax,%rcx,1),%r10 rax:192 rcx:0x7f0f0402e610(handle->table) r10:0x7f0f0402e6e0
0x00007f0f2418a480 <+272>: and %r9,%rdx r9:0x8fdc00000 rdx:0xc00000 12582912 buf_trunc(consume_old, chan->backend.buf_size - 1)
0x00007f0f2418a483 <+275>: mov 0x8(%rdi),%rax rdi:0x7f0efda75150 rax:2688 ref_offset = (size_t) ref->offset
0x00007f0f2418a487 <+279>: mov %r8d,%ecx ecx:0x14(20)
0x00007f0f2418a48a <+282>: shr %cl,%rdx (....)
0x00007f0f2418a48d <+285>: shl $0x7,%rdx
0x00007f0f2418a491 <+289>: add %rax,%rdx
0x00007f0f2418a494 <+292>: lea 0x80(%rdx),%rax
0x00007f0f2418a49b <+299>: cmp 0x28(%r10),%rax
0x00007f0f2418a49f <+303>: ja 0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180>
0x00007f0f2418a4a5 <+309>: mov 0x20(%r10),%rax r10:0x7f0f0402e6e0 rax:0x7f0efb86f000(buf)
0x00007f0f2418a4a9 <+313>: add %rdx,%rax rdx:3200 rax:0x7f0efb86fc80
0x00007f0f2418a4ac <+316>: mov (%rax),%rax rax:0x86c00000(commit_count!!!!!!Incremented _once_ at sb switch cc_sb)
0x00007f0f2418a4af <+319>: mov 0x80(%rbp),%r8 r8:0x100000(chan->backend.subbuf_size)
0x00007f0f2418a4b6 <+326>: mov 0x78(%rbp),%rdx rdx:0x1000000(chan->backend.buf_size)
0x00007f0f2418a4ba <+330>: mov 0x8c(%rbp),%ecx ecx:4
0x00007f0f2418a4c0 <+336>: mov 0x80(%rsi),%rdi rdi:0x86d400000
0x00007f0f2418a4c7 <+343>: sub %r8,%rax rax:0x86b00000(commit_count - chan->backend.subbuf_size)
0x00007f0f2418a4ca <+346>: and 0x8(%rbp),%rax rax & chan->commit_count_mask = rax:0x86b00000
0x00007f0f2418a4ce <+350>: neg %rdx rdx:0x1000000(16M chan->backend.buf_size) --> 0xffffffffff000000(-16777216)
0x00007f0f2418a4d1 <+353>: and %r9,%rdx r9:0x86c400000(consume_old) rdx:0x86c000000
0x00007f0f2418a4d4 <+356>: shr %cl,%rdx cl:4 rdx:0x86c00000
0x00007f0f2418a4d7 <+359>: cmp %rdx,%rax rax:0x86b0000 rdx:0x86c00000
0x00007f0f2418a4da <+362>: jne 0x7f0f2418a411 <lib_ring_buffer_channel_do_read+161>
0x00007f0f2418a4e0 <+368>: mov %r8,%rax
209 static inline
210 int lib_ring_buffer_poll_deliver(const struct lttng_ust_lib_ring_buffer_config *config,
211 struct lttng_ust_lib_ring_buffer *buf,
212 struct channel *chan,
213 struct lttng_ust_shm_handle *handle)
214 {
215 unsigned long consumed_old, consumed_idx, commit_count, write_offset;
216
217 consumed_old = uatomic_read(&buf->consumed);
218 consumed_idx = subbuf_index(consumed_old, chan);
219 commit_count = v_read(config, &shmp_index(handle, buf->commit_cold, consumed_idx)->cc_sb);
220 /*
221 * No memory barrier here, since we are only interested
222 * in a statistically correct polling result. The next poll will
223 * get the data is we are racing. The mb() that ensures correct
224 * memory order is in get_subbuf.
225 */
226 write_offset = v_read(config, &buf->offset);
227
228 /*
229 * Check that the subbuffer we are trying to consume has been
230 * already fully committed.
231 */
232
233 if (((commit_count - chan->backend.subbuf_size)
234 & chan->commit_count_mask)
235 - (buf_trunc(consumed_old, chan)
236 >> chan->backend.num_subbuf_order)
237 != 0)
238 return 0;
239
240 /*
241 * Check that we are not about to read the same subbuffer in
242 * which the writer head is.
243 */
244 if (subbuf_trunc(write_offset, chan) - subbuf_trunc(consumed_old, chan)
245 == 0)
246 return 0;
247
248 return 1;
249
250 }
It seems that lib_ring_buffer_channel_do_read()->lib_ring_buffer_poll_deliver() returned 0 at line 238 ($consume_old:0x86c400000, $commint_count:0x86c00000, $write_offset:0x86d400000).
Buffer type: per UID
Channels:
-------------
- channel0: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 1048576
number of subbufers: 16
switch timer interval: 20000000
read timer interval: 60000000
trace file count: 80
trace file size (bytes): 2097152
output: mmap()
PS: Lttng version is 2.7(I know it is an very old version:) )
Do you have any ideas about this problem? Thanks in advance
Thanks a lot
zhenyu.ren
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 22201 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [lttng-dev] 回复: lttng-consumerd can NOT get notification to consume ring buffer data
2021-12-08 1:47 ` [lttng-dev] 回复: " zhenyu.ren via lttng-dev
@ 2021-12-09 20:20 ` Mathieu Desnoyers via lttng-dev
2021-12-10 1:42 ` [lttng-dev] 回复:回复: " zhenyu.ren via lttng-dev
0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers via lttng-dev @ 2021-12-09 20:20 UTC (permalink / raw)
To: zhenyu.ren; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 847 bytes --]
----- On Dec 7, 2021, at 8:47 PM, zhenyu.ren <zhenyu.ren@aliyun.com> wrote:
> Hi, Desnoyers
> For various reasons, none of the three suggestions you mentioned about
> reproducing the problem can be achieved easily.
> I am also wonder why commit counter not updated...It seems that the span
> producers not complete commiting(be killed?hang?)? It‘s very hard to debug all
> span producers since they are too many...
> So, Can I make lttng-consumer just skip out this subbuffer and try to consume
> the next subbuffer?
No, it's not that simple. Once this state is reached (e.g. caused by a killed application), just skipping this sub-buffer
will not allow applications to continue using this buffer for tracing until it is destroyed.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 2214 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lttng-dev] 回复:回复: lttng-consumerd can NOT get notification to consume ring buffer data
2021-12-09 20:20 ` Mathieu Desnoyers via lttng-dev
@ 2021-12-10 1:42 ` zhenyu.ren via lttng-dev
0 siblings, 0 replies; 5+ messages in thread
From: zhenyu.ren via lttng-dev @ 2021-12-10 1:42 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: lttng-dev
[-- Attachment #1.1: Type: text/plain, Size: 1550 bytes --]
>No, it's not that simple
Can I update that sub-buffer commit_cold counter(how about hot one?) to the "right" value then skip consuming this sub-buffer(but update the consume counter as if I have consumed the sub-buffer)?
It seems the tracing progress will continue with the cost of missing all the spans in that sub-buffer.
Thanks
zhenyu.ren
------------------------------------------------------------------
发件人:Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
发送时间:2021年12月10日(星期五) 04:27
收件人:zhenyu.ren <zhenyu.ren@aliyun.com>
抄 送:lttng-dev <lttng-dev@lists.lttng.org>
主 题:Re: 回复:[lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data
----- On Dec 7, 2021, at 8:47 PM, zhenyu.ren <zhenyu.ren@aliyun.com> wrote:
Hi, Desnoyers
For various reasons, none of the three suggestions you mentioned about reproducing the problem can be achieved easily.
I am also wonder why commit counter not updated...It seems that the span producers not complete commiting(be killed?hang?)? It‘s very hard to debug all span producers since they are too many...
So, Can I make lttng-consumer just skip out this subbuffer and try to consume the next subbuffer?
No, it's not that simple. Once this state is reached (e.g. caused by a killed application), just skipping this sub-buffer
will not allow applications to continue using this buffer for tracing until it is destroyed.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
[-- Attachment #1.2: Type: text/html, Size: 4649 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-10 1:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 9:03 [lttng-dev] lttng-consumerd can NOT get notification to consume ring buffer data zhenyu.ren via lttng-dev
2021-12-07 20:07 ` Mathieu Desnoyers via lttng-dev
2021-12-08 1:47 ` [lttng-dev] 回复: " zhenyu.ren via lttng-dev
2021-12-09 20:20 ` Mathieu Desnoyers via lttng-dev
2021-12-10 1:42 ` [lttng-dev] 回复:回复: " zhenyu.ren via lttng-dev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.