All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: lttng-health-check
       [not found] <48CF5AC71E61DB46B70D0F388054EFFD3598FA27@VAL-E-02.valcartier.drdc-rddc.gc.ca>
@ 2014-03-20 19:57 ` Jérémie Galarneau
  0 siblings, 0 replies; 2+ messages in thread
From: Jérémie Galarneau @ 2014-03-20 19:57 UTC (permalink / raw)
  To: Thibault, Daniel; +Cc: lttng-dev

On Thu, Mar 20, 2014 at 2:27 PM, Thibault, Daniel
<Daniel.Thibault@drdc-rddc.gc.ca> wrote:
> LTTNG_HEALTH_APP_MANAGE:
> The lttng-health-check function will report the health of the application command socket manager subsystem.  This session daemon thread (thread_manage_apps) watches the application command sockets; their closure indicates application shutdown (more accurately, shutdown of the application's tracepoint provider) and triggers unregistration.
>
> Would it be correct to say that if this component malfunctions, closure of application trace files (if tracing in per-process ID mode) will be delayed until the session is destroyed, but that there won't be any other truly deleterious consequences?  It seems unlikely any events will be lost (the consumer will commit the buffer contents normally), and this thread failure would not hamper tracing of other applications nor client-controlled enabling and disabling of events.  The lttng list of event sources would include ghosts (dead-and-gone apps).  Could lttng hang if the client tried to toggle the enabling of events published by the ghosts?
>

It depends on where the "manage apps" thread hangs/dies. Both
scenarios are a possibility. The outcome depends largely on whether or
not the thread was holding a lock at that time.

> I guess my point is that, of all the possible health failures, this seems the least catastrophic one.
>

One important thing to understand is that the health check is not an
error recovery mechanism; it is only meant as a tool to detect
unexpected internal errors from which we can only recover by
restarting.

Regards,
Jérémie

> Daniel U. Thibault
> Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
> Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
> R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
> 2459 route de la Bravoure
> Québec QC  G3J 1X5
> CANADA
> Vox : (418) 844-4000 x4245
> Fax : (418) 844-4538
> NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
> Gouvernement du Canada | Government of Canada
> <http://www.valcartier.drdc-rddc.gc.ca/>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* lttng-health-check
@ 2014-03-20 18:27 Thibault, Daniel
  0 siblings, 0 replies; 2+ messages in thread
From: Thibault, Daniel @ 2014-03-20 18:27 UTC (permalink / raw)
  To: lttng-dev

LTTNG_HEALTH_APP_MANAGE:
The lttng-health-check function will report the health of the application command socket manager subsystem.  This session daemon thread (thread_manage_apps) watches the application command sockets; their closure indicates application shutdown (more accurately, shutdown of the application's tracepoint provider) and triggers unregistration.

Would it be correct to say that if this component malfunctions, closure of application trace files (if tracing in per-process ID mode) will be delayed until the session is destroyed, but that there won't be any other truly deleterious consequences?  It seems unlikely any events will be lost (the consumer will commit the buffer contents normally), and this thread failure would not hamper tracing of other applications nor client-controlled enabling and disabling of events.  The lttng list of event sources would include ghosts (dead-and-gone apps).  Could lttng hang if the client tried to toggle the enabling of events published by the ghosts?

I guess my point is that, of all the possible health failures, this seems the least catastrophic one.

Daniel U. Thibault
Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
2459 route de la Bravoure
Québec QC  G3J 1X5
CANADA
Vox : (418) 844-4000 x4245
Fax : (418) 844-4538
NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
Gouvernement du Canada | Government of Canada
<http://www.valcartier.drdc-rddc.gc.ca/>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-03-20 19:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <48CF5AC71E61DB46B70D0F388054EFFD3598FA27@VAL-E-02.valcartier.drdc-rddc.gc.ca>
2014-03-20 19:57 ` lttng-health-check Jérémie Galarneau
2014-03-20 18:27 lttng-health-check Thibault, Daniel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.