lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* [lttng-dev] lttng live event loss with babeltrace2
@ 2021-05-04  0:05 Eqbal via lttng-dev
  2021-05-04 13:16 ` Jonathan Rajotte-Julien via lttng-dev
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eqbal via lttng-dev @ 2021-05-04  0:05 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1299 bytes --]

Hi,

I have a lttng live session trace consumer application using libbabeltrace2
where I create a graph to consume lttng live session traces and output to
another sink. I am running the graph in a loop at some polling interval as
long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is that
if my polling interval is large enough I tend to lose either all or some of
the events. I experimented with various polling intervals and it seems if
the polling interval is less than *DELAYUS *from "lttng-create
--live=DELAYUS" option then I am able to get all the events, otherwise I
tend to lose events.

Here are the steps I follow:
1. start session daemon and relay daemon
2. create a live session (with default delay of 1s), enable events and start
3. Start my application (hello world example from lttng docs)
4. Start the consumer application built using libbabeltrace that connects
to the live session

I noticed that the events are actually persisted in the ~/lttng-traces by
the relay daemon, but it does not reach babeltrace consumer application. I
have noticed the same behavior with babeltrace2 cli.

I would like to understand what is the reason for such behavior and if
playing with the polling interval in relation to the DELAYUS value is the
right thing to do.

Thanks,
Eqbal

[-- Attachment #1.2: Type: text/html, Size: 1561 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lttng-dev] lttng live event loss with babeltrace2
  2021-05-04  0:05 [lttng-dev] lttng live event loss with babeltrace2 Eqbal via lttng-dev
@ 2021-05-04 13:16 ` Jonathan Rajotte-Julien via lttng-dev
  2021-05-04 14:44 ` Jonathan Rajotte-Julien via lttng-dev
  2021-05-07 18:05 ` Jonathan Rajotte-Julien via lttng-dev
  2 siblings, 0 replies; 6+ messages in thread
From: Jonathan Rajotte-Julien via lttng-dev @ 2021-05-04 13:16 UTC (permalink / raw)
  To: Eqbal; +Cc: lttng-dev

Hi,

On Mon, May 03, 2021 at 05:05:10PM -0700, Eqbal via lttng-dev wrote:
> Hi,
> 
> I have a lttng live session trace consumer application using libbabeltrace2
> where I create a graph to consume lttng live session traces and output to
> another sink. I am running the graph in a loop at some polling interval as
> long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is that
> if my polling interval is large enough I tend to lose either all or some of
> the events. I experimented with various polling intervals and it seems if
> the polling interval is less than *DELAYUS *from "lttng-create
> --live=DELAYUS" option then I am able to get all the events, otherwise I
> tend to lose events.
> 
> Here are the steps I follow:
> 1. start session daemon and relay daemon
> 2. create a live session (with default delay of 1s), enable events and start
> 3. Start my application (hello world example from lttng docs)
> 4. Start the consumer application built using libbabeltrace that connects
> to the live session
> 
> I noticed that the events are actually persisted in the ~/lttng-traces by
> the relay daemon, but it does not reach babeltrace consumer application. I
> have noticed the same behavior with babeltrace2 cli.

Could you also test against babeltrace 1.5? It might give us a bit of a head
start to debug this if it turns out to be unexpected behaviour.

-- 
Jonathan Rajotte-Julien
EfficiOS
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lttng-dev] lttng live event loss with babeltrace2
  2021-05-04  0:05 [lttng-dev] lttng live event loss with babeltrace2 Eqbal via lttng-dev
  2021-05-04 13:16 ` Jonathan Rajotte-Julien via lttng-dev
@ 2021-05-04 14:44 ` Jonathan Rajotte-Julien via lttng-dev
  2021-05-07 18:05 ` Jonathan Rajotte-Julien via lttng-dev
  2 siblings, 0 replies; 6+ messages in thread
From: Jonathan Rajotte-Julien via lttng-dev @ 2021-05-04 14:44 UTC (permalink / raw)
  To: Eqbal; +Cc: lttng-dev

On Mon, May 03, 2021 at 05:05:10PM -0700, Eqbal via lttng-dev wrote:
> Hi,
> 
> I have a lttng live session trace consumer application using libbabeltrace2
> where I create a graph to consume lttng live session traces and output to
> another sink. I am running the graph in a loop at some polling interval as
> long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is that
> if my polling interval is large enough I tend to lose either all or some of
> the events. I experimented with various polling intervals and it seems if
> the polling interval is less than *DELAYUS *from "lttng-create
> --live=DELAYUS" option then I am able to get all the events, otherwise I
> tend to lose events.
> 
> Here are the steps I follow:
> 1. start session daemon and relay daemon
> 2. create a live session (with default delay of 1s), enable events and start
> 3. Start my application (hello world example from lttng docs)

Not sure if you modified it in any way, but be careful with short lived apps
since an app can terminate before lttng-ust have a chance to register.

> 4. Start the consumer application built using libbabeltrace that connects
> to the live session

hmm. Note that when attaching to a session it does not start at the beginning of the
trace collected by lttng-relayd, it start at the last received data from
lttng-relayd from the lttng-consumerd (LTTNG_VIEWER_SEEK_LAST).

Hence I would recommend that these steps be inversed:

4. Start the consumer application built using libbabeltrace that connects
to the live session
3. Start my application (hello world example from lttng docs)


> 
> I noticed that the events are actually persisted in the ~/lttng-traces by
> the relay daemon, but it does not reach babeltrace consumer application. I
> have noticed the same behavior with babeltrace2 cli.
> 
> I would like to understand what is the reason for such behavior and if
> playing with the polling interval in relation to the DELAYUS value is the
> right thing to do.

I think I reproduced the issue but I'm not completely sure it is the same
problem. Please file an issue on the bug tracker [1] with
as much information as possible, the exact lttng
commands used, the current behaviour and the expected behaviour. 
I'll add my findings if relevant.

But I think it might be a weird handling of how we handle the first "empty"
retry and the subsequent get phase. After the initial phase everything seems to
work as expected.

[1] https://bugs.lttng.org/


-- 
Jonathan Rajotte-Julien
EfficiOS
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lttng-dev] lttng live event loss with babeltrace2
  2021-05-04  0:05 [lttng-dev] lttng live event loss with babeltrace2 Eqbal via lttng-dev
  2021-05-04 13:16 ` Jonathan Rajotte-Julien via lttng-dev
  2021-05-04 14:44 ` Jonathan Rajotte-Julien via lttng-dev
@ 2021-05-07 18:05 ` Jonathan Rajotte-Julien via lttng-dev
  2021-05-11  0:35   ` Eqbal via lttng-dev
  2 siblings, 1 reply; 6+ messages in thread
From: Jonathan Rajotte-Julien via lttng-dev @ 2021-05-07 18:05 UTC (permalink / raw)
  To: Eqbal; +Cc: lttng-dev

Hi,

See https://github.com/lttng/lttng-tools/commit/c876d657fb08a91ca7c907b92f1b7604aee664ee . 

We would appreciate your feedback on this if possible based on your use case.

Cheers

On Mon, May 03, 2021 at 05:05:10PM -0700, Eqbal via lttng-dev wrote:
> Hi,
> 
> I have a lttng live session trace consumer application using libbabeltrace2
> where I create a graph to consume lttng live session traces and output to
> another sink. I am running the graph in a loop at some polling interval as
> long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is that
> if my polling interval is large enough I tend to lose either all or some of
> the events. I experimented with various polling intervals and it seems if
> the polling interval is less than *DELAYUS *from "lttng-create
> --live=DELAYUS" option then I am able to get all the events, otherwise I
> tend to lose events.
> 
> Here are the steps I follow:
> 1. start session daemon and relay daemon
> 2. create a live session (with default delay of 1s), enable events and start
> 3. Start my application (hello world example from lttng docs)
> 4. Start the consumer application built using libbabeltrace that connects
> to the live session
> 
> I noticed that the events are actually persisted in the ~/lttng-traces by
> the relay daemon, but it does not reach babeltrace consumer application. I
> have noticed the same behavior with babeltrace2 cli.
> 
> I would like to understand what is the reason for such behavior and if
> playing with the polling interval in relation to the DELAYUS value is the
> right thing to do.
> 
> Thanks,
> Eqbal

> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


-- 
Jonathan Rajotte-Julien
EfficiOS
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lttng-dev] lttng live event loss with babeltrace2
  2021-05-07 18:05 ` Jonathan Rajotte-Julien via lttng-dev
@ 2021-05-11  0:35   ` Eqbal via lttng-dev
  2021-05-11  1:14     ` Jonathan Rajotte-Julien via lttng-dev
  0 siblings, 1 reply; 6+ messages in thread
From: Eqbal via lttng-dev @ 2021-05-11  0:35 UTC (permalink / raw)
  To: Jonathan Rajotte-Julien; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 2528 bytes --]

Hi,

I ran my test with the new changes and it seems to improve the lossiness
but not eliminate it. For example it seems in the hello world example that
if the application exits too quickly after emitting the tracepoint, we
might still lose some events in the consumer even though they are present
in the relayd output folder. I don't quite understand why that is. Would
love some explanations on the various timing windows where we have a
potential to lose the events on the consumer side.

Thanks,
Eqbal

On Fri, May 7, 2021 at 11:05 AM Jonathan Rajotte-Julien <
jonathan.rajotte-julien@efficios.com> wrote:

> Hi,
>
> See
> https://github.com/lttng/lttng-tools/commit/c876d657fb08a91ca7c907b92f1b7604aee664ee
> .
>
> We would appreciate your feedback on this if possible based on your use
> case.
>
> Cheers
>
> On Mon, May 03, 2021 at 05:05:10PM -0700, Eqbal via lttng-dev wrote:
> > Hi,
> >
> > I have a lttng live session trace consumer application using
> libbabeltrace2
> > where I create a graph to consume lttng live session traces and output to
> > another sink. I am running the graph in a loop at some polling interval
> as
> > long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is
> that
> > if my polling interval is large enough I tend to lose either all or some
> of
> > the events. I experimented with various polling intervals and it seems if
> > the polling interval is less than *DELAYUS *from "lttng-create
> > --live=DELAYUS" option then I am able to get all the events, otherwise I
> > tend to lose events.
> >
> > Here are the steps I follow:
> > 1. start session daemon and relay daemon
> > 2. create a live session (with default delay of 1s), enable events and
> start
> > 3. Start my application (hello world example from lttng docs)
> > 4. Start the consumer application built using libbabeltrace that connects
> > to the live session
> >
> > I noticed that the events are actually persisted in the ~/lttng-traces by
> > the relay daemon, but it does not reach babeltrace consumer application.
> I
> > have noticed the same behavior with babeltrace2 cli.
> >
> > I would like to understand what is the reason for such behavior and if
> > playing with the polling interval in relation to the DELAYUS value is the
> > right thing to do.
> >
> > Thanks,
> > Eqbal
>
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>

[-- Attachment #1.2: Type: text/html, Size: 3448 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lttng-dev] lttng live event loss with babeltrace2
  2021-05-11  0:35   ` Eqbal via lttng-dev
@ 2021-05-11  1:14     ` Jonathan Rajotte-Julien via lttng-dev
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Rajotte-Julien via lttng-dev @ 2021-05-11  1:14 UTC (permalink / raw)
  To: Eqbal; +Cc: lttng-dev

Hi, 

> I ran my test with the new changes and it seems to improve the lossiness but not
> eliminate it. For example it seems in the hello world example that if the
> application exits too quickly after emitting the tracepoint, we might still
> lose some events in the consumer even though they are present in the relayd
> output folder.

Here by consumer you mean a "live reader" right? If so I would suggest you use the term 
"live consumer reader" or "live reader" since "consumer" alone can yield some confusion 
with lttng-consumerd. 

I would need a reproducer for this, you can base yourself on the reproducer in the commit message for the previous fix. 
AFAIK there is no valid reason for this to happen, at 
least based on my review of the code for the fix we just merged.
(This is only valid for per-uid tracing, per-pid tracing and lttng live have not so minor limitation inherent to the current lttng-live implementation) 

> I don't quite understand why that is. Would love some explanations on the
> various timing windows where we have a potential to lose the events on the
> consumer side.

From a live reader perspective, in per-uid mode, there should be no "loss" of event from the moment the viewer is attached and 
the moment it detach. Here "loss" mean that the events are present in the trace gathered by lttng-relayd but not by the live 
reader for the same "time window". This is only valid if the viewer is not "late" and have consumed everything at the moment it detach.


Cheers

----- Original Message -----
> From: "Eqbal" <eqbalzee@gmail.com>
> To: "Jonathan Rajotte-Julien" <jonathan.rajotte-julien@efficios.com>
> Cc: "lttng-dev" <lttng-dev@lists.lttng.org>
> Sent: Monday, May 10, 2021 8:35:41 PM
> Subject: Re: [lttng-dev] lttng live event loss with babeltrace2

Hi, 

> I ran my test with the new changes and it seems to improve the lossiness but not
> eliminate it. For example it seems in the hello world example that if the
> application exits too quickly after emitting the tracepoint, we might still
> lose some events in the consumer even though they are present in the relayd
> output folder.

Here by consumer you mean a "live reader" right? If so I would suggest you use the term "live consumer reader" or "live reader" since "consumer" alone can yield some confusion with lttng-consumerd. 

I would need a reproducer for this, you can base yourself on the reproducer in the commit message for the previous fix. 
AFAIK there is no valid reason for this to happen, at least based on my review of the code for the fix we just merged. (This is only valid for per-uid tracing, per-pid tracing and lttng live have not so minor limitation inherent to the current lttng-live implementation) 

> I don't quite understand why that is. Would love some explanations on the
> various timing windows where we have a potential to lose the events on the
> consumer side.

From a live reader perspective, in per-uid mode, there should be no "loss" of event from the moment the viewer is attached and the moment it detach. Here "loss" mean that the events are present in the trace gathered by lttng-relayd but not by the live reader for the same "time window". This is only valid if the viewer is not "late" and have consumed everything at the detach moment. 

Cheers 
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-11  1:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  0:05 [lttng-dev] lttng live event loss with babeltrace2 Eqbal via lttng-dev
2021-05-04 13:16 ` Jonathan Rajotte-Julien via lttng-dev
2021-05-04 14:44 ` Jonathan Rajotte-Julien via lttng-dev
2021-05-07 18:05 ` Jonathan Rajotte-Julien via lttng-dev
2021-05-11  0:35   ` Eqbal via lttng-dev
2021-05-11  1:14     ` Jonathan Rajotte-Julien via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).