All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd
       [not found] <20171006182247.6874-1-jonathan.rajotte-julien@efficios.com>
@ 2017-10-06 18:22 ` Jonathan Rajotte
  2017-12-04 15:23 ` [PATCH lttng-tools v3 1/2] lttng-relayd: use tcp keep-alive mechanism to detect dead-peer Jérémie Galarneau
       [not found] ` <20171006182247.6874-2-jonathan.rajotte-julien@efficios.com>
  2 siblings, 0 replies; 4+ messages in thread
From: Jonathan Rajotte @ 2017-10-06 18:22 UTC (permalink / raw)
  To: lttng-dev; +Cc: jgalar

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
---
 doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
index 218526a1..e37ce845 100644
--- a/doc/man/lttng-relayd.8.txt
+++ b/doc/man/lttng-relayd.8.txt
@@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
 `LTTNG_RELAYD_HEALTH`::
     Path to relay daemon health's socket.
 
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
+    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
+    peers.
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
+    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
+    A value of -1 lets the operating system manage this parameter (default).
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
+    See tcp(7) tcp_keepalive_probes.
+    A value of -1 lets the operating system manage this parameter (default).
+
+NOTE: Not supported on Solaris.
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
+    See tcp(7) tcp_keepalive_intvl.
+    A value of -1 lets the operating system manage this parameter (default).
+
+NOTE: Not supported on Solaris.
 
 FILES
 -----
-- 
2.11.0

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd
       [not found] ` <20171006182247.6874-2-jonathan.rajotte-julien@efficios.com>
@ 2017-11-23 19:03   ` Jérémie Galarneau
  2017-12-04 15:23   ` Jérémie Galarneau
  1 sibling, 0 replies; 4+ messages in thread
From: Jérémie Galarneau @ 2017-11-23 19:03 UTC (permalink / raw)
  To: Philippe Proulx; +Cc: lttng-dev, Jeremie Galarneau

Phil, any comments on this patch?

Jérémie

On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien@efficios.com> wrote:
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> ---
>  doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
> index 218526a1..e37ce845 100644
> --- a/doc/man/lttng-relayd.8.txt
> +++ b/doc/man/lttng-relayd.8.txt
> @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
>  `LTTNG_RELAYD_HEALTH`::
>      Path to relay daemon health's socket.
>
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
> +    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
> +    peers.
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
> +    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
> +    See tcp(7) tcp_keepalive_probes.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +NOTE: Not supported on Solaris.
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
> +    See tcp(7) tcp_keepalive_intvl.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +NOTE: Not supported on Solaris.
>
>  FILES
>  -----
> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH lttng-tools v3 1/2] lttng-relayd: use tcp keep-alive mechanism to detect dead-peer
       [not found] <20171006182247.6874-1-jonathan.rajotte-julien@efficios.com>
  2017-10-06 18:22 ` [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd Jonathan Rajotte
@ 2017-12-04 15:23 ` Jérémie Galarneau
       [not found] ` <20171006182247.6874-2-jonathan.rajotte-julien@efficios.com>
  2 siblings, 0 replies; 4+ messages in thread
From: Jérémie Galarneau @ 2017-12-04 15:23 UTC (permalink / raw)
  To: Jonathan Rajotte; +Cc: lttng-dev, Jeremie Galarneau

Hi,

Just got around to reviewing both patches. Read on.

On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien@efficios.com> wrote:
> v3:
> - Fix inversion of definition for tcp_keepalive_time_valid.
> - Handle value of -1 and value < -1 in tcp_keepalive_time_valid
> - Allow value of -1 for LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME,
>   LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES, LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL
>   on Solaris 10 & 11.
> - Fix LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME=-1 cases for solaris which would
>   yield -1000 as final modifier value.
>
> v2:
> -  Replace __sun -> __sun__
> -  Use MS_PER_SEC
> -  Modified definition of tcp_keepalive_time_valid to increase
>    readability.
> -  In tcp_keepalive_time_valid return true if value is -1 since it is
>    equivalent to letting the system manage the setting.
> -  Removed debugging getsockopt calls
>
> --
> Allow relayd to clean-up objects related to a dead connection
> for which the FIN packet was no emitted (Unexpected shutdown,
> ethernet blocking). Note that an idle peer is not considered dead given
> that it respond to the keep-alive query after the idle time is elapsed.
>
> By RFC 1122-4.2.3.6 implementation must default to no less than two
> hours for the idle period. On linux the default value is indeed 2 hours.
> This could be problematic if relayd should be aggressive regarding
> dead-peers. Hence it is important to provide tuning knob regarding the
> tcp keep-alive mechanism.
>
> The following environments variable can be used to enable and fine-tune
> it:
>     LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE
>         Set to 1 to enable the use of tcp keep-alive allowing the detection
>         of dead peers.
>
>     LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME
>         See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on
>         Solaris 11.
>         A value of -1 lets the operating system manage this parameter
>         (default).
>
>     LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES
>         See tcp(7) tcp_keepalive_probes.
>         A value of -1 lets the operating system manage this
>         parameter (default).
>         No effect on Solaris.
>
>     LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
>         See tcp(7) tcp_keepalive_intvl.
>         A value of -1 lets the operating system manage
>         his parameter (default).
>

See comments on the second patch for naming conventions.

> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> ---
>  src/bin/lttng-relayd/Makefile.am      |   3 +-
>  src/bin/lttng-relayd/main.c           |  14 ++
>  src/bin/lttng-relayd/tcp_keep_alive.c | 250 ++++++++++++++++++++++++++++++++++
>  src/bin/lttng-relayd/tcp_keep_alive.h |  25 ++++
>  4 files changed, 291 insertions(+), 1 deletion(-)
>  create mode 100644 src/bin/lttng-relayd/tcp_keep_alive.c
>  create mode 100644 src/bin/lttng-relayd/tcp_keep_alive.h
>
> diff --git a/src/bin/lttng-relayd/Makefile.am b/src/bin/lttng-relayd/Makefile.am
> index c7dd37e1..5125c721 100644
> --- a/src/bin/lttng-relayd/Makefile.am
> +++ b/src/bin/lttng-relayd/Makefile.am
> @@ -21,7 +21,8 @@ lttng_relayd_SOURCES = main.c lttng-relayd.h utils.h utils.c cmd.h \
>                         stream-fd.c stream-fd.h \
>                         connection.c connection.h \
>                         viewer-session.c viewer-session.h \
> -                       tracefile-array.c tracefile-array.h
> +                       tracefile-array.c tracefile-array.h \
> +                       tcp_keep_alive.c tcp_keep_alive.h
>
>  # link on liblttngctl for check if relayd is already alive.
>  lttng_relayd_LDADD = -lurcu-common -lurcu \
> diff --git a/src/bin/lttng-relayd/main.c b/src/bin/lttng-relayd/main.c
> index 0eb8e282..b43743a5 100644
> --- a/src/bin/lttng-relayd/main.c
> +++ b/src/bin/lttng-relayd/main.c
> @@ -70,6 +70,7 @@
>  #include "stream.h"
>  #include "connection.h"
>  #include "tracefile-array.h"
> +#include "tcp_keep_alive.h"
>
>  static const char *help_msg =
>  #ifdef LTTNG_EMBED_HELP
> @@ -899,6 +900,14 @@ restart:
>                                         lttcomm_destroy_sock(newsock);
>                                         goto error;
>                                 }
> +
> +                               ret = tcp_keepalive_setsockopt(newsock->fd);

Rename to socket_apply_keep_alive_config(...)

Note the change from KEEPALIVE to KEEP_ALIVE which should be
used, except when required by system definitions.

> +                               if (ret < 0) {
> +                                       PERROR("setsockopt tcp_keep_alive");
> +                                       lttcomm_destroy_sock(newsock);
> +                                       goto error;
> +                               }
> +
>                                 new_conn = connection_create(newsock, type);
>                                 if (!new_conn) {
>                                         lttcomm_destroy_sock(newsock);
> @@ -2755,6 +2764,11 @@ int main(int argc, char **argv)
>                 goto exit_options;
>         }
>
> +       if (tcp_keepalive_get_settings()) {

This can be done lazily on the first socket_apply_keep_alive_config(...).

> +               retval = -1;
> +               goto exit_options;
> +       }
> +
>         if (set_signal_handler()) {
>                 retval = -1;
>                 goto exit_options;
> diff --git a/src/bin/lttng-relayd/tcp_keep_alive.c b/src/bin/lttng-relayd/tcp_keep_alive.c
> new file mode 100644
> index 00000000..c6498787
> --- /dev/null
> +++ b/src/bin/lttng-relayd/tcp_keep_alive.c
> @@ -0,0 +1,250 @@
> +/*
> + * Copyright (C) 2017 - Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2 only,
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#include <sys/types.h>
> +#include <netinet/tcp.h>
> +#include <stdbool.h>
> +#include <sys/socket.h>
> +
> +#include <common/compat/getenv.h>
> +#include <common/time.h>
> +
> +#include "tcp_keep_alive.h"
> +
> +#define LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE_ENV "LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE"
> +#define LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME_ENV "LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME"
> +#define LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES_ENV "LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES"
> +#define LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL_ENV "LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL"

See comments regarding the naming of those variables on the second patch.

Also, those definitions belong in defaults.h

> +
> +#ifdef __sun__
> +#define COMPAT_TCP_KEEPIDLE TCP_KEEPALIVE_THRESHOLD
> +#define SOL_TCP IPPROTO_TCP /* Solaris does not support SOL_TCP */
> +#else
> +#define COMPAT_TCP_KEEPIDLE TCP_KEEPIDLE
> +#endif /* __sun__ */
> +
> +static bool tcp_keepalive_enabled = false;
> +static int tcp_keepalive_time = -1;
> +static int tcp_keepalive_intvl = -1;
> +static int tcp_keepalive_probes = -1;
> +
> +#ifdef __sun__
> +static bool tcp_keepalive_time_valid(int value)
> +{
> +       bool ret;

Missing whiteline.

> +       if (value < -1) {
> +               ret = false;
> +               goto end;
> +       }
> +       if (value == -1) {
> +               /* Let the system manage the parameter */

Missing '.' at the end of the comment.

> +               ret = true;
> +               goto end;
> +       }
> +#ifdef TCP_KEEPALIVE_THRESHOLD

The various #ifdefs are making this file hard to follow. Since
there seems to be fundamental differences between what Solaris
and Linux support here, please implement the OS-specific logic
in different files.

You could use a simple internal API of the form

struct tcp_keep_alive_support {
    /* TCP keep-alive is supported by this platform. */
    bool supported;
    /* Overriding idle-time per application/socket is supported by
this platform. */
    bool idle_time_supported;
    /* Overriding probe interval per application/socket is supported
by this platform. */
    bool probe_interval_supported;
    /* Configuring max probe count per application/socket is supported
by this platform. */
    bool max_probe_count_supported;
};

/* Implement per-platform. */
static
int tcp_keep_alive_init_support(struct tcp_keep_alive_support *support);

struct tcp_keep_alive_config {
    bool enabled;
    int idle_time;
    int probe_interval;
    int max_probe_count;
};

/* Retrieve settings from env vars and check/warn if supported by platform. */
static
int tcp_keep_alive_init_config(struct tcp_keep_alive_config *config);

And then re-use that config when applying the options to the sockets

> +       /*
> +        * Minimum 10s , maximum 10 days. Defined by
> +        * https://docs.oracle.com/cd/E23824_01/html/821-1475/tcp-7p.html#REFMAN7tcp-7p
> +        */
> +       if (value < 10 || value > 864000) {

Report this error through logging.

> +               ret = false;
> +               goto end;
> +       }
> +       ret = true;
> +#else
> +       WARN("Solaris 10 does not support local override of the TCP_KEEP_ALIVE_THRESHOLD. " LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME_ENV);

That message is a bit confusing. If by "local" you mean per-socket or
per-application override, specify it (otherwise I may be
misunderstanding what this means).

Also, mentioning "TCP_KEEP_ALIVE_THRESHOLD" may confuse users since we
don't use that terminology in the documentation.

Also, if this option exists as a system-wide setting on Solaris, it
would be helpful to mention it here.

As for the wording, I can propose

"Overriding the TCP keep-alive idle time threshold per-application is
not supported by this platform. Ignoring the
LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME environment variable."


> +       ret = false;
> +       goto end;
> +#endif /* TCP_KEEPALIVE_THRESHOLD */
> +end:
> +       return ret;
> +}
> +#else
> +static bool tcp_keepalive_time_valid(int value)
> +{
> +       return value >= -1;
> +}
> +#endif /* __sun__ */
> +
> +int tcp_keepalive_get_settings(void)
> +{
> +       int ret;
> +       const char *value;
> +
> +       value = lttng_secure_getenv(LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE_ENV);
> +       if (value && !strcmp(value, "1")) {
> +               tcp_keepalive_enabled = true;
> +       } else {
> +               tcp_keepalive_enabled = false;
> +               ret = 0;
> +               goto disabled;
> +       }
> +
> +       /* Get value for tcp_keepalive_time in seconds*/
> +       value = lttng_secure_getenv(LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME_ENV);
> +       if (value) {
> +               int tmp;

Missing whiteline.

> +               errno = 0;
> +               tmp = (int) strtol(value, NULL, 0);

This cast is not safe. Store the result in a long int and validate
that the range is acceptable.

Let's do it in a helper function and reuse it for the INTVL
setting (duplicated code).

> +               if (errno != 0) {
> +                       PERROR("TCP_KEEP_ALIVE time parse");
> +                       ret = 1;
> +                       goto error;
> +               }
> +
> +               if (!tcp_keepalive_time_valid(tmp)) {
> +                       ERR("TCP_KEEP_ALIVE time invalid value");
> +                       ret = 1;
> +                       goto error;
> +               }
> +#ifdef __sun__
> +               /*
> +                * Under solaris this value is expressed in
> +                * milliseconds. Fits in a int.
> +                */
> +               if (tmp != -1) {
> +                       tmp = tmp * MSEC_PER_SEC;

Check for overflow here or, better yet, check for it
in tcp_keepalive_time_valid().

> +               }
> +#endif /* ifdef __sun__ */
> +               tcp_keepalive_time = tmp;
> +       }
> +
> +
> +       /* Get value for tcp_keepalive_intvl in seconds */
> +       value = lttng_secure_getenv(LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL_ENV);
> +       if (value) {
> +               int tmp;
> +               errno = 0;
> +               tmp = (int) strtol(value, NULL, 0);
> +               if (errno != 0 || tmp < -1) {
> +                       PERROR("TCP_KEEP_ALIVE interval parse");
> +                       ret = 1;
> +                       goto error;
> +               } else {
> +                       if (tmp >= 0) {
> +#ifdef __sun__
> +                               WARN("Solaris does not support local override of tcp_keepalive_intvl. " LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL_ENV);
> +                               ret = 1;
> +                               goto error;
> +#else
> +                               tcp_keepalive_intvl = tmp;
> +#endif /* __sun__ */
> +                       }
> +               }
> +       }
> +
> +       /* Get value for tcp_keepalive_probes */
> +       value = lttng_secure_getenv(LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES_ENV);
> +       if (value) {
> +               int tmp;
> +               errno = 0;
> +               tmp = (int) strtol(value, NULL, 0);
> +               if (errno != 0 || tmp < -1) {
> +                       PERROR("TCP_KEEP_ALIVE probes parse");
> +                       ret = 1;
> +                       goto error;
> +               } else {
> +                       if (tmp >= 0) {
> +#ifdef __sun__
> +                               WARN("Solaris does not support local override of tcp_keepalive_probes. " LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES_ENV);
> +                               ret = 1;
> +                               goto error;
> +#else
> +                               tcp_keepalive_probes = tmp;
> +#endif /* __sun__ */
> +                       }
> +               }
> +       }
> +
> +       DBG("TCP_KEEP_ALIVE enabled");
> +       if (tcp_keepalive_time > -1) {
> +               DBG("Overwrite tcp_keepalive_time to %d", tcp_keepalive_time);

"Overwrite" -> "Overriding"

Same below.

> +       }
> +       if (tcp_keepalive_intvl > -1) {
> +               DBG("Overwrite tcp_keepalive_intvl to %d", tcp_keepalive_intvl);
> +       }
> +       if (tcp_keepalive_probes > -1) {
> +               DBG("Overwrite tcp_keepalive_time to %d", tcp_keepalive_probes);

Don't use platform-specific configuration names since they are not the same
everywhere. Let's stick to our terminology and the man page will ensure we
explain what the option maps to on each platform (when applicable).

> +       }
> +       ret = 0;
> +
> +error:
> +disabled:
> +       return ret;
> +}
> +
> +/*
> + * Set the socket options regarding tcp_keepalive.
> + */
> +int tcp_keepalive_setsockopt(int fd)
> +{
> +       int ret;
> +       int val = 1;
> +
> +       if (!tcp_keepalive_enabled) {
> +               ret = 0;
> +               goto end;
> +       }
> +
> +       ret = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val,
> +                       sizeof(val));
> +       if (ret < 0) {
> +               PERROR("setsockopt so_keepalive");
> +               goto end;
> +       }
> +
> +#if !defined(__sun__) || (defined(__sun__) && defined(TCP_KEEPALIVE_THRESHOLD))
> +       if (tcp_keepalive_time > -1) {
> +               ret = setsockopt(fd, SOL_TCP, COMPAT_TCP_KEEPIDLE, &tcp_keepalive_time,
> +                               sizeof(tcp_keepalive_time));
> +               if (ret < 0) {
> +                       PERROR("setsockopt TCP_KEEPIDLE");
> +                       goto end;
> +               }
> +       }
> +#endif /* ! defined(__sun__) || (defined(__sun__) && defined(TCP_KEEPALIVE_THRESHOLD)) */
> +
> +       /* Sun does not support either tcp_keepalive_probes or
> +        * tcp_keepalive_intvl. Inferring a value for
> +        * TCP_KEEPALIVE_ABORT_THRESHOLD doing
> +        * tcp_keepalive_probes * tcp_keepalive_intvl could yield a good
> +        * alternative but Solaris does not detail the algorithm used (constant
> +        * time retry like linux or somthing fancier). So simply ignore those
> +        * setting on solaris for now.
> +        */
> +#ifndef __sun__
> +       if (tcp_keepalive_intvl > -1) {
> +               ret = setsockopt(fd, SOL_TCP, TCP_KEEPINTVL, &tcp_keepalive_intvl,
> +                               sizeof(tcp_keepalive_intvl));
> +               if (ret < 0) {
> +                       PERROR("setsockopt TCP_KEEPINTVL");
> +                       goto end;
> +               }
> +       }
> +
> +       if (tcp_keepalive_probes > -1) {
> +               ret = setsockopt(fd, SOL_TCP, TCP_KEEPCNT, &tcp_keepalive_probes,
> +                               sizeof(tcp_keepalive_probes));
> +               if (ret < 0) {
> +                       PERROR("setsockopt TCP_KEEPCNT");
> +                       goto end;
> +               }
> +       }
> +#endif /* __sun__ */
> +end:

There should be no platform-specific code in this function given
the support check proposed.

> +       return ret;
> +}
> diff --git a/src/bin/lttng-relayd/tcp_keep_alive.h b/src/bin/lttng-relayd/tcp_keep_alive.h
> new file mode 100644
> index 00000000..cd8025a0
> --- /dev/null
> +++ b/src/bin/lttng-relayd/tcp_keep_alive.h
> @@ -0,0 +1,25 @@
> +#ifndef RELAYD_TCP_KEEP_ALIVE_H
> +#define RELAYD_TCP_KEEP_ALIVE_H
> +
> +/*
> + * Copyright (C) 2017 - Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2 only,
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +
> +int tcp_keepalive_get_settings(void);
> +int tcp_keepalive_setsockopt(int socket_fd);
> +

Mark both functions as 'static'.


> +#endif /* RELAYD_UTILS_H */

Change to RELAYD_TCP_KEEP_ALIVE_H


> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd
       [not found] ` <20171006182247.6874-2-jonathan.rajotte-julien@efficios.com>
  2017-11-23 19:03   ` [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd Jérémie Galarneau
@ 2017-12-04 15:23   ` Jérémie Galarneau
  1 sibling, 0 replies; 4+ messages in thread
From: Jérémie Galarneau @ 2017-12-04 15:23 UTC (permalink / raw)
  To: Jonathan Rajotte; +Cc: lttng-dev, Jeremie Galarneau

On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien@efficios.com> wrote:
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
> ---
>  doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
> index 218526a1..e37ce845 100644
> --- a/doc/man/lttng-relayd.8.txt
> +++ b/doc/man/lttng-relayd.8.txt
> @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
>  `LTTNG_RELAYD_HEALTH`::
>      Path to relay daemon health's socket.
>
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
> +    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
> +    peers.

I think we should add an explanation of why this option matters.

I would enable the keep-alive mechanism by default considering the fairly
severe problems users can experience without this mechanism in place
(fd exhaustion
and what amounts to a memory leak) when connections are dropped.

Given this, it would make sense to change the variable to
'LTTNG_RELAYD_TCP_KEEP_ALIVE' which the user can set to 0/1 to override the
default policy (default to 'on').

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
> +    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
> +    A value of -1 lets the operating system manage this parameter (default).

Pointing the user to tcp(7) is okay, but a sentence to explain what this
setting does would be helpful.

"The number of seconds a connection needs to be idle before TCP begins
sending out keep-alive probes."

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME.

Please add the time unit expected in this variable. Also, what would 0
mean here?

While I agree with -1 == OS default, I'm wondering if we should be
more aggressive.

The Linux default of two hours seems like an awfully long time... Let's keep
the platform defaults for now and I'll do some research to see what would be a
reasonable value here if we decide to shorten the interval.

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
> +    See tcp(7) tcp_keepalive_probes.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT.

> +
> +NOTE: Not supported on Solaris.

Is this option supported by Cygwin?


> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
> +    See tcp(7) tcp_keepalive_intvl.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL.

> +
> +NOTE: Not supported on Solaris.

Same comment about Cygwin.


>
>  FILES
>  -----
> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-04 15:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171006182247.6874-1-jonathan.rajotte-julien@efficios.com>
2017-10-06 18:22 ` [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd Jonathan Rajotte
2017-12-04 15:23 ` [PATCH lttng-tools v3 1/2] lttng-relayd: use tcp keep-alive mechanism to detect dead-peer Jérémie Galarneau
     [not found] ` <20171006182247.6874-2-jonathan.rajotte-julien@efficios.com>
2017-11-23 19:03   ` [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd Jérémie Galarneau
2017-12-04 15:23   ` Jérémie Galarneau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.