All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] keepalive default
@ 2020-07-08 19:15 Vladimir Sementsov-Ogievskiy
  2020-07-08 19:15 ` [PATCH 1/2] sockets: keep-alive settings Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-08 19:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: vsementsov, berrange, armbru, kraxel, den

Hi all!

We understood, that keepalive is almost superfluous with default 2 hours
in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the
whole system doesn't seem right, better setup it per-socket.

These series suggests to set some defaults for keepalive settings as
well as set keepalive itself by default.

keepalive helps to not hang on io other side is down.

Vladimir Sementsov-Ogievskiy (2):
  sockets: keep-alive settings
  util/qemu-sockets: make keep-alive enabled by default

 qapi/sockets.json   | 33 +++++++++++++++++-
 util/qemu-sockets.c | 81 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 102 insertions(+), 12 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] sockets: keep-alive settings
  2020-07-08 19:15 [PATCH 0/2] keepalive default Vladimir Sementsov-Ogievskiy
@ 2020-07-08 19:15 ` Vladimir Sementsov-Ogievskiy
  2020-07-09  8:33   ` Daniel P. Berrangé
  2020-07-08 19:15 ` [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-08 19:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: vsementsov, berrange, armbru, kraxel, den

Introduce keep-alive settings (TCP_KEEPCNT, TCP_KEEPIDLE,
TCP_KEEPINTVL) and chose some defaults.

The linux default of 2 hours for /proc/tcp_keepalive_time
(corresponding to TCP_KEEPIDLE) makes keep-alive option almost
superfluous. Let's add a possibility to set the options by hand
and specify some defaults resulting in smaller total time to terminate
idle connection.

Do not document the default values in QAPI as they may be altered in
future (careful user will use explicit values).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Suggested default numbers are RFC, any better suggestion is welcome.
I just looked at /etc/libvirt/qemu.conf in my system and take values of
keepalive_interval and keepalive_count.
The only thing I'm sure in is that 2 hours is too long.

 qapi/sockets.json   | 33 +++++++++++++++++++-
 util/qemu-sockets.c | 76 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/qapi/sockets.json b/qapi/sockets.json
index cbd6ef35d0..73ff66a5d5 100644
--- a/qapi/sockets.json
+++ b/qapi/sockets.json
@@ -37,6 +37,37 @@
     'host': 'str',
     'port': 'str' } }
 
+##
+# @KeepAliveSettings:
+#
+# @idle: The time (in seconds) the connection needs to remain idle
+#        before TCP starts sending keepalive probes (sets TCP_KEEPIDLE).
+# @interval: The time (in seconds) between individual keepalive probes
+#            (sets TCP_KEEPINTVL).
+# @count: The maximum number of keepalive probes TCP should send before
+#         dropping the connection (sets TCP_KEEPCNT).
+#
+# Since: 5.2
+##
+{ 'struct': 'KeepAliveSettings',
+  'data': {
+    'idle': 'int',
+    'interval': 'int',
+    'count': 'int' } }
+
+##
+# @KeepAliveField:
+#
+# @enabled: If true, enable keep-alive with some default settings
+# @settings: Enable keep-alive and use explicit settings
+#
+# Since: 5.2
+##
+{ 'alternate': 'KeepAliveField',
+  'data': {
+    'enabled': 'bool',
+    'settings': 'KeepAliveSettings' } }
+
 ##
 # @InetSocketAddress:
 #
@@ -65,7 +96,7 @@
     '*to': 'uint16',
     '*ipv4': 'bool',
     '*ipv6': 'bool',
-    '*keep-alive': 'bool' } }
+    '*keep-alive': 'KeepAliveField' } }
 
 ##
 # @UnixSocketAddress:
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index b37d288866..b961963472 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -433,6 +433,57 @@ static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
     return res;
 }
 
+/*
+ * inet_set_keepalive
+ *
+ * Handle keep_alive settings. If user specified settings explicitly, fail if
+ * can't set the settings. If user just enabled keep-alive, not specifying the
+ * settings, try to set defaults but ignore failures.
+ */
+static int inet_set_keepalive(int sock, bool has_keep_alive,
+                              KeepAliveField *keep_alive, Error **errp)
+{
+    int ret;
+    int val;
+    bool has_settings = has_keep_alive &&  keep_alive->type == QTYPE_QDICT;
+
+    if (!has_keep_alive || (keep_alive->type == QTYPE_QBOOL &&
+                            !keep_alive->u.enabled))
+    {
+        return 0;
+    }
+
+    val = 1;
+    ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val));
+    if (ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
+        return -1;
+    }
+
+    val = has_settings ? keep_alive->u.settings.idle : 30;
+    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val));
+    if (has_settings && ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set TCP_KEEPIDLE");
+        return -1;
+    }
+
+    val = has_settings ? keep_alive->u.settings.interval : 30;
+    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val));
+    if (has_settings && ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set TCP_KEEPINTVL");
+        return -1;
+    }
+
+    val = has_settings ? keep_alive->u.settings.count : 20;
+    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val));
+    if (has_settings && ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set TCP_KEEPCNT");
+        return -1;
+    }
+
+    return 0;
+}
+
 /**
  * Create a socket and connect it to an address.
  *
@@ -468,16 +519,11 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error **errp)
         return sock;
     }
 
-    if (saddr->keep_alive) {
-        int val = 1;
-        int ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE,
-                                  &val, sizeof(val));
-
-        if (ret < 0) {
-            error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
-            close(sock);
-            return -1;
-        }
+    if (inet_set_keepalive(sock, saddr->has_keep_alive, saddr->keep_alive,
+                           errp) < 0)
+    {
+        close(sock);
+        return -1;
     }
 
     return sock;
@@ -677,12 +723,20 @@ int inet_parse(InetSocketAddress *addr, const char *str, Error **errp)
     }
     begin = strstr(optstr, ",keep-alive");
     if (begin) {
+        bool val;
+
         if (inet_parse_flag("keep-alive", begin + strlen(",keep-alive"),
-                            &addr->keep_alive, errp) < 0)
+                            &val, errp) < 0)
         {
             return -1;
         }
+
         addr->has_keep_alive = true;
+        addr->keep_alive = g_new(KeepAliveField, 1);
+        *addr->keep_alive = (KeepAliveField) {
+            .type = QTYPE_QBOOL,
+            .u.enabled = val
+        };
     }
     return 0;
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-08 19:15 [PATCH 0/2] keepalive default Vladimir Sementsov-Ogievskiy
  2020-07-08 19:15 ` [PATCH 1/2] sockets: keep-alive settings Vladimir Sementsov-Ogievskiy
@ 2020-07-08 19:15 ` Vladimir Sementsov-Ogievskiy
  2020-07-09  8:29   ` Daniel P. Berrangé
  2020-07-08 19:41 ` [PATCH 0/2] keepalive default no-reply
  2020-07-09  8:35 ` Daniel P. Berrangé
  3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-08 19:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: vsementsov, berrange, armbru, kraxel, den

Keep-alive won't hurt, let's try to enable it even if not requested by
user.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 util/qemu-sockets.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index b961963472..f6851376f5 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -438,7 +438,8 @@ static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
  *
  * Handle keep_alive settings. If user specified settings explicitly, fail if
  * can't set the settings. If user just enabled keep-alive, not specifying the
- * settings, try to set defaults but ignore failures.
+ * settings, try to set defaults but ignore failures. If keep-alive option is
+ * not specified, try to set it but ignore failures.
  */
 static int inet_set_keepalive(int sock, bool has_keep_alive,
                               KeepAliveField *keep_alive, Error **errp)
@@ -447,8 +448,8 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
     int val;
     bool has_settings = has_keep_alive &&  keep_alive->type == QTYPE_QDICT;
 
-    if (!has_keep_alive || (keep_alive->type == QTYPE_QBOOL &&
-                            !keep_alive->u.enabled))
+    if (has_keep_alive &&
+        keep_alive->type == QTYPE_QBOOL && !keep_alive->u.enabled)
     {
         return 0;
     }
@@ -456,8 +457,12 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
     val = 1;
     ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val));
     if (ret < 0) {
-        error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
-        return -1;
+        if (has_keep_alive) {
+            error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
+            return -1;
+        } else {
+            return 0;
+        }
     }
 
     val = has_settings ? keep_alive->u.settings.idle : 30;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] keepalive default
  2020-07-08 19:15 [PATCH 0/2] keepalive default Vladimir Sementsov-Ogievskiy
  2020-07-08 19:15 ` [PATCH 1/2] sockets: keep-alive settings Vladimir Sementsov-Ogievskiy
  2020-07-08 19:15 ` [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default Vladimir Sementsov-Ogievskiy
@ 2020-07-08 19:41 ` no-reply
  2020-07-09  8:35 ` Daniel P. Berrangé
  3 siblings, 0 replies; 14+ messages in thread
From: no-reply @ 2020-07-08 19:41 UTC (permalink / raw)
  To: vsementsov; +Cc: vsementsov, berrange, qemu-devel, armbru, kraxel, den

Patchew URL: https://patchew.org/QEMU/20200708191540.28455-1-vsementsov@virtuozzo.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      hw/hppa/trace.o
In file included from /tmp/qemu-test/src/util/qemu-sockets.c:24:
/tmp/qemu-test/src/util/qemu-sockets.c: In function 'inet_set_keepalive':
/tmp/qemu-test/src/util/qemu-sockets.c:469:46: error: 'TCP_KEEPIDLE' undeclared (first use in this function)
  469 |     ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val));
      |                                              ^~~~~~~~~~~~
/tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt'
---
/tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt'
   48 |     setsockopt(sockfd, level, optname, (const void *)optval, optlen)
      |                               ^~~~~~~
/tmp/qemu-test/src/util/qemu-sockets.c:476:46: error: 'TCP_KEEPINTVL' undeclared (first use in this function)
  476 |     ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val));
      |                                              ^~~~~~~~~~~~~
/tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt'
   48 |     setsockopt(sockfd, level, optname, (const void *)optval, optlen)
      |                               ^~~~~~~
/tmp/qemu-test/src/util/qemu-sockets.c:483:46: error: 'TCP_KEEPCNT' undeclared (first use in this function)
  483 |     ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val));
      |                                              ^~~~~~~~~~~
/tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt'
   48 |     setsockopt(sockfd, level, optname, (const void *)optval, optlen)
      |                               ^~~~~~~
make: *** [/tmp/qemu-test/src/rules.mak:69: util/qemu-sockets.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=bab255245d94432aabefeb4189c6aced', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-mck04b5u/src/docker-src.2020-07-08-15.38.08.17376:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=bab255245d94432aabefeb4189c6aced
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-mck04b5u/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    3m11.264s
user    0m8.976s


The full log is available at
http://patchew.org/logs/20200708191540.28455-1-vsementsov@virtuozzo.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-08 19:15 ` [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default Vladimir Sementsov-Ogievskiy
@ 2020-07-09  8:29   ` Daniel P. Berrangé
  2020-07-09  8:49     ` Vladimir Sementsov-Ogievskiy
  2020-07-09  8:54     ` Denis V. Lunev
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2020-07-09  8:29 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, kraxel, qemu-devel, armbru

On Wed, Jul 08, 2020 at 10:15:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Keep-alive won't hurt, let's try to enable it even if not requested by
> user.

Keep-alive intentionally breaks TCP connections earlier than normal
in face of transient networking problems.

The question is more about which type of pain is more desirable. A
stall in the network connection (for a potentially very long time),
or an intentionally broken socket.

I'm not at all convinced it is a good idea to intentionally break
/all/ QEMU sockets in the face of transient problems, even if the
problems last for 2 hours or more. 

I could see keep-alives being ok on some QEMU socket. For example
VNC/SPICE clients, as there is no downside to proactively culling
them as they can trivially reconnect. Migration too is quite
reasonable to use keep alives, as you generally want migration to
run to completion in a short amount of time, and aborting migration
needs to be safe no matter what.

Breaking chardevs or block devices or network devices that use
QEMU sockets though will be disruptive. The only solution once
those backends have a dead socket is going to be to kill QEMU
and cold-boot the VM again.


> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  util/qemu-sockets.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index b961963472..f6851376f5 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -438,7 +438,8 @@ static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
>   *
>   * Handle keep_alive settings. If user specified settings explicitly, fail if
>   * can't set the settings. If user just enabled keep-alive, not specifying the
> - * settings, try to set defaults but ignore failures.
> + * settings, try to set defaults but ignore failures. If keep-alive option is
> + * not specified, try to set it but ignore failures.
>   */
>  static int inet_set_keepalive(int sock, bool has_keep_alive,
>                                KeepAliveField *keep_alive, Error **errp)
> @@ -447,8 +448,8 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
>      int val;
>      bool has_settings = has_keep_alive &&  keep_alive->type == QTYPE_QDICT;
>  
> -    if (!has_keep_alive || (keep_alive->type == QTYPE_QBOOL &&
> -                            !keep_alive->u.enabled))
> +    if (has_keep_alive &&
> +        keep_alive->type == QTYPE_QBOOL && !keep_alive->u.enabled)
>      {
>          return 0;
>      }
> @@ -456,8 +457,12 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
>      val = 1;
>      ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val));
>      if (ret < 0) {
> -        error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
> -        return -1;
> +        if (has_keep_alive) {
> +            error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
> +            return -1;
> +        } else {
> +            return 0;
> +        }
>      }
>  
>      val = has_settings ? keep_alive->u.settings.idle : 30;
> -- 
> 2.21.0
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] sockets: keep-alive settings
  2020-07-08 19:15 ` [PATCH 1/2] sockets: keep-alive settings Vladimir Sementsov-Ogievskiy
@ 2020-07-09  8:33   ` Daniel P. Berrangé
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2020-07-09  8:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, kraxel, qemu-devel, armbru

On Wed, Jul 08, 2020 at 10:15:38PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Introduce keep-alive settings (TCP_KEEPCNT, TCP_KEEPIDLE,
> TCP_KEEPINTVL) and chose some defaults.
> 
> The linux default of 2 hours for /proc/tcp_keepalive_time
> (corresponding to TCP_KEEPIDLE) makes keep-alive option almost
> superfluous. Let's add a possibility to set the options by hand
> and specify some defaults resulting in smaller total time to terminate
> idle connection.

As you say, 2 hours just a default. The sysadmin can override that
as they wish to change the behaviour globally on the system, so using
the global settings is quite reasonable IMHO.

> Do not document the default values in QAPI as they may be altered in
> future (careful user will use explicit values).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Suggested default numbers are RFC, any better suggestion is welcome.
> I just looked at /etc/libvirt/qemu.conf in my system and take values of
> keepalive_interval and keepalive_count.
> The only thing I'm sure in is that 2 hours is too long.
> 
>  qapi/sockets.json   | 33 +++++++++++++++++++-
>  util/qemu-sockets.c | 76 ++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 97 insertions(+), 12 deletions(-)
> 
> diff --git a/qapi/sockets.json b/qapi/sockets.json
> index cbd6ef35d0..73ff66a5d5 100644
> --- a/qapi/sockets.json
> +++ b/qapi/sockets.json
> @@ -37,6 +37,37 @@
>      'host': 'str',
>      'port': 'str' } }
>  
> +##
> +# @KeepAliveSettings:
> +#
> +# @idle: The time (in seconds) the connection needs to remain idle
> +#        before TCP starts sending keepalive probes (sets TCP_KEEPIDLE).
> +# @interval: The time (in seconds) between individual keepalive probes
> +#            (sets TCP_KEEPINTVL).
> +# @count: The maximum number of keepalive probes TCP should send before
> +#         dropping the connection (sets TCP_KEEPCNT).
> +#
> +# Since: 5.2
> +##
> +{ 'struct': 'KeepAliveSettings',
> +  'data': {
> +    'idle': 'int',
> +    'interval': 'int',
> +    'count': 'int' } }
> +
> +##
> +# @KeepAliveField:
> +#
> +# @enabled: If true, enable keep-alive with some default settings
> +# @settings: Enable keep-alive and use explicit settings
> +#
> +# Since: 5.2
> +##
> +{ 'alternate': 'KeepAliveField',
> +  'data': {
> +    'enabled': 'bool',
> +    'settings': 'KeepAliveSettings' } }
> +
>  ##
>  # @InetSocketAddress:
>  #
> @@ -65,7 +96,7 @@
>      '*to': 'uint16',
>      '*ipv4': 'bool',
>      '*ipv6': 'bool',
> -    '*keep-alive': 'bool' } }
> +    '*keep-alive': 'KeepAliveField' } }
>  
>  ##
>  # @UnixSocketAddress:
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index b37d288866..b961963472 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -433,6 +433,57 @@ static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
>      return res;
>  }
>  
> +/*
> + * inet_set_keepalive
> + *
> + * Handle keep_alive settings. If user specified settings explicitly, fail if
> + * can't set the settings. If user just enabled keep-alive, not specifying the
> + * settings, try to set defaults but ignore failures.
> + */
> +static int inet_set_keepalive(int sock, bool has_keep_alive,
> +                              KeepAliveField *keep_alive, Error **errp)
> +{
> +    int ret;
> +    int val;
> +    bool has_settings = has_keep_alive &&  keep_alive->type == QTYPE_QDICT;
> +
> +    if (!has_keep_alive || (keep_alive->type == QTYPE_QBOOL &&
> +                            !keep_alive->u.enabled))
> +    {
> +        return 0;
> +    }
> +
> +    val = 1;
> +    ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val));
> +    if (ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
> +        return -1;
> +    }
> +
> +    val = has_settings ? keep_alive->u.settings.idle : 30;
> +    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val));
> +    if (has_settings && ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set TCP_KEEPIDLE");
> +        return -1;
> +    }
> +
> +    val = has_settings ? keep_alive->u.settings.interval : 30;
> +    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val));
> +    if (has_settings && ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set TCP_KEEPINTVL");
> +        return -1;
> +    }
> +
> +    val = has_settings ? keep_alive->u.settings.count : 20;
> +    ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val));
> +    if (has_settings && ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set TCP_KEEPCNT");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  /**
>   * Create a socket and connect it to an address.
>   *
> @@ -468,16 +519,11 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error **errp)
>          return sock;
>      }
>  
> -    if (saddr->keep_alive) {
> -        int val = 1;
> -        int ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE,
> -                                  &val, sizeof(val));
> -
> -        if (ret < 0) {
> -            error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
> -            close(sock);
> -            return -1;
> -        }
> +    if (inet_set_keepalive(sock, saddr->has_keep_alive, saddr->keep_alive,
> +                           errp) < 0)
> +    {
> +        close(sock);
> +        return -1;
>      }
>  
>      return sock;
> @@ -677,12 +723,20 @@ int inet_parse(InetSocketAddress *addr, const char *str, Error **errp)
>      }
>      begin = strstr(optstr, ",keep-alive");
>      if (begin) {
> +        bool val;
> +
>          if (inet_parse_flag("keep-alive", begin + strlen(",keep-alive"),
> -                            &addr->keep_alive, errp) < 0)
> +                            &val, errp) < 0)
>          {
>              return -1;
>          }
> +
>          addr->has_keep_alive = true;
> +        addr->keep_alive = g_new(KeepAliveField, 1);
> +        *addr->keep_alive = (KeepAliveField) {
> +            .type = QTYPE_QBOOL,
> +            .u.enabled = val
> +        };
>      }
>      return 0;
>  }
> -- 
> 2.21.0
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] keepalive default
  2020-07-08 19:15 [PATCH 0/2] keepalive default Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-07-08 19:41 ` [PATCH 0/2] keepalive default no-reply
@ 2020-07-09  8:35 ` Daniel P. Berrangé
  2020-07-09 15:34   ` Eric Blake
  3 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrangé @ 2020-07-09  8:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, kraxel, qemu-devel, armbru

On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> We understood, that keepalive is almost superfluous with default 2 hours
> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the
> whole system doesn't seem right, better setup it per-socket.

Adding the ability to explicitly configure the keepalive settings makes
sense for QEMU. Completely ignoring system defaults when no explicit
settings are given though is not valid IMHO.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-09  8:29   ` Daniel P. Berrangé
@ 2020-07-09  8:49     ` Vladimir Sementsov-Ogievskiy
  2020-07-09 11:52       ` Daniel P. Berrangé
  2020-07-09  8:54     ` Denis V. Lunev
  1 sibling, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-09  8:49 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: den, kraxel, qemu-devel, armbru

09.07.2020 11:29, Daniel P. Berrangé wrote:
> On Wed, Jul 08, 2020 at 10:15:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Keep-alive won't hurt, let's try to enable it even if not requested by
>> user.
> 
> Keep-alive intentionally breaks TCP connections earlier than normal
> in face of transient networking problems.
> 
> The question is more about which type of pain is more desirable. A
> stall in the network connection (for a potentially very long time),
> or an intentionally broken socket.
> 
> I'm not at all convinced it is a good idea to intentionally break
> /all/ QEMU sockets in the face of transient problems, even if the
> problems last for 2 hours or more.
> 
> I could see keep-alives being ok on some QEMU socket. For example
> VNC/SPICE clients, as there is no downside to proactively culling
> them as they can trivially reconnect. Migration too is quite
> reasonable to use keep alives, as you generally want migration to
> run to completion in a short amount of time, and aborting migration
> needs to be safe no matter what.
> 
> Breaking chardevs or block devices or network devices that use
> QEMU sockets though will be disruptive. The only solution once
> those backends have a dead socket is going to be to kill QEMU
> and cold-boot the VM again.
> 

Reasonable, thanks for explanation.

We are mostly interested in keep-alive for migration and NBD connections.
(NBD driver has ability to reconnect). What do you think about setting
keep-alive (with some KEEPIDLE smaller than 2 hours) by default for
migration and NBD (at least when NBD reconnect is enabled), would it be
valid?

> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   util/qemu-sockets.c | 15 ++++++++++-----
>>   1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
>> index b961963472..f6851376f5 100644
>> --- a/util/qemu-sockets.c
>> +++ b/util/qemu-sockets.c
>> @@ -438,7 +438,8 @@ static struct addrinfo *inet_parse_connect_saddr(InetSocketAddress *saddr,
>>    *
>>    * Handle keep_alive settings. If user specified settings explicitly, fail if
>>    * can't set the settings. If user just enabled keep-alive, not specifying the
>> - * settings, try to set defaults but ignore failures.
>> + * settings, try to set defaults but ignore failures. If keep-alive option is
>> + * not specified, try to set it but ignore failures.
>>    */
>>   static int inet_set_keepalive(int sock, bool has_keep_alive,
>>                                 KeepAliveField *keep_alive, Error **errp)
>> @@ -447,8 +448,8 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
>>       int val;
>>       bool has_settings = has_keep_alive &&  keep_alive->type == QTYPE_QDICT;
>>   
>> -    if (!has_keep_alive || (keep_alive->type == QTYPE_QBOOL &&
>> -                            !keep_alive->u.enabled))
>> +    if (has_keep_alive &&
>> +        keep_alive->type == QTYPE_QBOOL && !keep_alive->u.enabled)
>>       {
>>           return 0;
>>       }
>> @@ -456,8 +457,12 @@ static int inet_set_keepalive(int sock, bool has_keep_alive,
>>       val = 1;
>>       ret = qemu_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val));
>>       if (ret < 0) {
>> -        error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
>> -        return -1;
>> +        if (has_keep_alive) {
>> +            error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
>> +            return -1;
>> +        } else {
>> +            return 0;
>> +        }
>>       }
>>   
>>       val = has_settings ? keep_alive->u.settings.idle : 30;
>> -- 
>> 2.21.0
>>
> 
> Regards,
> Daniel
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-09  8:29   ` Daniel P. Berrangé
  2020-07-09  8:49     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-09  8:54     ` Denis V. Lunev
  2020-07-09 11:40       ` Markus Armbruster
  1 sibling, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2020-07-09  8:54 UTC (permalink / raw)
  To: Daniel P. Berrangé, Vladimir Sementsov-Ogievskiy
  Cc: kraxel, qemu-devel, armbru

On 7/9/20 11:29 AM, Daniel P. Berrangé wrote:
> On Wed, Jul 08, 2020 at 10:15:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Keep-alive won't hurt, let's try to enable it even if not requested by
>> user.
> Keep-alive intentionally breaks TCP connections earlier than normal
> in face of transient networking problems.
>
> The question is more about which type of pain is more desirable. A
> stall in the network connection (for a potentially very long time),
> or an intentionally broken socket.
>
> I'm not at all convinced it is a good idea to intentionally break
> /all/ QEMU sockets in the face of transient problems, even if the
> problems last for 2 hours or more. 
>
> I could see keep-alives being ok on some QEMU socket. For example
> VNC/SPICE clients, as there is no downside to proactively culling
> them as they can trivially reconnect. Migration too is quite
> reasonable to use keep alives, as you generally want migration to
> run to completion in a short amount of time, and aborting migration
> needs to be safe no matter what.
>
> Breaking chardevs or block devices or network devices that use
> QEMU sockets though will be disruptive. The only solution once
> those backends have a dead socket is going to be to kill QEMU
> and cold-boot the VM again.

nope, and this is exactly what we are trying to achive.

Let us assume that QEMU NBD is connected to the
outside world, f.e. to some HA service running in
other virtual machine. Once that far away VM is
becoming dead, it is re-started on some other host
with the same IP.

QEMU NBD has an ability to reconnect to this same
endpoint and this process is transient for the guest.

This is the workflow we are trying to improve.

Anyway, sitting over dead socket is somewhat
which is not productive. This is like NFS hard and
soft mounts. In hypervisor world using hard mounts
(defaults before the patch) leads to various non
detectable deadlocks, that is why we are proposing
soft with such defaults.

It should also be noted that this is more consistent
as we could face the problem if we perform write
to the dead socket OR we could hang forever, thus
the problem with the current state is still possible.
With new settings we would consistently observe
the problem.

Den


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-09  8:54     ` Denis V. Lunev
@ 2020-07-09 11:40       ` Markus Armbruster
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2020-07-09 11:40 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Vladimir Sementsov-Ogievskiy, Daniel P. Berrangé,
	kraxel, qemu-devel

"Denis V. Lunev" <den@openvz.org> writes:

> On 7/9/20 11:29 AM, Daniel P. Berrangé wrote:
>> On Wed, Jul 08, 2020 at 10:15:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> Keep-alive won't hurt, let's try to enable it even if not requested by
>>> user.
>> Keep-alive intentionally breaks TCP connections earlier than normal
>> in face of transient networking problems.
>>
>> The question is more about which type of pain is more desirable. A
>> stall in the network connection (for a potentially very long time),
>> or an intentionally broken socket.
>>
>> I'm not at all convinced it is a good idea to intentionally break
>> /all/ QEMU sockets in the face of transient problems, even if the
>> problems last for 2 hours or more. 
>>
>> I could see keep-alives being ok on some QEMU socket. For example
>> VNC/SPICE clients, as there is no downside to proactively culling
>> them as they can trivially reconnect. Migration too is quite
>> reasonable to use keep alives, as you generally want migration to
>> run to completion in a short amount of time, and aborting migration
>> needs to be safe no matter what.
>>
>> Breaking chardevs or block devices or network devices that use
>> QEMU sockets though will be disruptive. The only solution once
>> those backends have a dead socket is going to be to kill QEMU
>> and cold-boot the VM again.
>
> nope, and this is exactly what we are trying to achive.
>
> Let us assume that QEMU NBD is connected to the
> outside world, f.e. to some HA service running in
> other virtual machine. Once that far away VM is
> becoming dead, it is re-started on some other host
> with the same IP.
>
> QEMU NBD has an ability to reconnect to this same
> endpoint and this process is transient for the guest.
>
> This is the workflow we are trying to improve.
>
> Anyway, sitting over dead socket is somewhat
> which is not productive. This is like NFS hard and
> soft mounts. In hypervisor world using hard mounts
> (defaults before the patch) leads to various non
> detectable deadlocks, that is why we are proposing
> soft with such defaults.
>
> It should also be noted that this is more consistent
> as we could face the problem if we perform write
> to the dead socket OR we could hang forever, thus
> the problem with the current state is still possible.
> With new settings we would consistently observe
> the problem.

Daniel's point remains valid: keep-alive makes sense only for sockets
where we can recover from connection breakage.  When graceful recovery
is impossible, we shouldn't aggressively break unresponsive connections,
throwing away the chance (however slim) of them becoming responsive
again.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default
  2020-07-09  8:49     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-09 11:52       ` Daniel P. Berrangé
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2020-07-09 11:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, kraxel, armbru, qemu-devel

On Thu, Jul 09, 2020 at 11:49:17AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 09.07.2020 11:29, Daniel P. Berrangé wrote:
> > On Wed, Jul 08, 2020 at 10:15:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > Keep-alive won't hurt, let's try to enable it even if not requested by
> > > user.
> > 
> > Keep-alive intentionally breaks TCP connections earlier than normal
> > in face of transient networking problems.
> > 
> > The question is more about which type of pain is more desirable. A
> > stall in the network connection (for a potentially very long time),
> > or an intentionally broken socket.
> > 
> > I'm not at all convinced it is a good idea to intentionally break
> > /all/ QEMU sockets in the face of transient problems, even if the
> > problems last for 2 hours or more.
> > 
> > I could see keep-alives being ok on some QEMU socket. For example
> > VNC/SPICE clients, as there is no downside to proactively culling
> > them as they can trivially reconnect. Migration too is quite
> > reasonable to use keep alives, as you generally want migration to
> > run to completion in a short amount of time, and aborting migration
> > needs to be safe no matter what.
> > 
> > Breaking chardevs or block devices or network devices that use
> > QEMU sockets though will be disruptive. The only solution once
> > those backends have a dead socket is going to be to kill QEMU
> > and cold-boot the VM again.
> > 
> 
> Reasonable, thanks for explanation.
> 
> We are mostly interested in keep-alive for migration and NBD connections.
> (NBD driver has ability to reconnect). What do you think about setting
> keep-alive (with some KEEPIDLE smaller than 2 hours) by default for
> migration and NBD (at least when NBD reconnect is enabled), would it be
> valid?

I think it should be reasonable to set by default for those particular
scenarios, as both are expecting failures and ready to take action when
they occur.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] keepalive default
  2020-07-09  8:35 ` Daniel P. Berrangé
@ 2020-07-09 15:34   ` Eric Blake
  2020-07-09 17:14     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Blake @ 2020-07-09 15:34 UTC (permalink / raw)
  To: Daniel P. Berrangé, Vladimir Sementsov-Ogievskiy
  Cc: den, kraxel, qemu-devel, armbru

On 7/9/20 3:35 AM, Daniel P. Berrangé wrote:
> On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> We understood, that keepalive is almost superfluous with default 2 hours
>> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the
>> whole system doesn't seem right, better setup it per-socket.
> 
> Adding the ability to explicitly configure the keepalive settings makes
> sense for QEMU. Completely ignoring system defaults when no explicit
> settings are given though is not valid IMHO.

We already have the ability to add per-socket keepalive (see commit 
aec21d3175, in 4.2).  I guess what you are trying to further do is 
determine whether the default value for that field, when not explicitly 
specified by the user, can have saner semantics (default off for chardev 
sockets, default on for nbd clients where retry was enabled).  But since 
you already have to explicitly opt-in to nbd retry, what's so hard about 
opting in to keepalive at the same time, other than more typing?  Given 
that the easiest way to do this is via a machine-coded generation of the 
command line or QMP command, it doesn't seem that hard to just keep 
things as they are with explicit opt-in to per-socket keepalive.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] keepalive default
  2020-07-09 15:34   ` Eric Blake
@ 2020-07-09 17:14     ` Vladimir Sementsov-Ogievskiy
  2020-07-10 19:25       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-09 17:14 UTC (permalink / raw)
  To: Eric Blake, Daniel P. Berrangé; +Cc: den, kraxel, qemu-devel, armbru

09.07.2020 18:34, Eric Blake wrote:
> On 7/9/20 3:35 AM, Daniel P. Berrangé wrote:
>> On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> We understood, that keepalive is almost superfluous with default 2 hours
>>> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the
>>> whole system doesn't seem right, better setup it per-socket.
>>
>> Adding the ability to explicitly configure the keepalive settings makes
>> sense for QEMU. Completely ignoring system defaults when no explicit
>> settings are given though is not valid IMHO.
> 
> We already have the ability to add per-socket keepalive (see commit aec21d3175, in 4.2).  I guess what you are trying to further do is determine whether the default value for that field, when not explicitly specified by the user, can have saner semantics (default off for chardev sockets, default on for nbd clients where retry was enabled).  But since you already have to explicitly opt-in to nbd retry, what's so hard about opting in to keepalive at the same time, other than more typing?  Given that the easiest way to do this is via a machine-coded generation of the command line or QMP command, it doesn't seem that hard to just keep things as they are with explicit opt-in to per-socket keepalive.
> 

Hmm. Looking at the code, I remember that reconnect is not optional, it works by default now. The option we have is "reconnect-delay" which only specify, after how much seconds we'll automatically fail the requests, not retrying them (0 seconds by default). Still, NBD tries to reconnect in background anyway.

In our downstream we have now old version of nbd-reconnect interface and enabled non-zero "delay" by default.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] keepalive default
  2020-07-09 17:14     ` Vladimir Sementsov-Ogievskiy
@ 2020-07-10 19:25       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-10 19:25 UTC (permalink / raw)
  To: Eric Blake, Daniel P. Berrangé; +Cc: den, kraxel, qemu-devel, armbru

09.07.2020 20:14, Vladimir Sementsov-Ogievskiy wrote:
> 09.07.2020 18:34, Eric Blake wrote:
>> On 7/9/20 3:35 AM, Daniel P. Berrangé wrote:
>>> On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all!
>>>>
>>>> We understood, that keepalive is almost superfluous with default 2 hours
>>>> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the
>>>> whole system doesn't seem right, better setup it per-socket.
>>>
>>> Adding the ability to explicitly configure the keepalive settings makes
>>> sense for QEMU. Completely ignoring system defaults when no explicit
>>> settings are given though is not valid IMHO.
>>
>> We already have the ability to add per-socket keepalive (see commit aec21d3175, in 4.2).  I guess what you are trying to further do is determine whether the default value for that field, when not explicitly specified by the user, can have saner semantics (default off for chardev sockets, default on for nbd clients where retry was enabled).  But since you already have to explicitly opt-in to nbd retry, what's so hard about opting in to keepalive at the same time, other than more typing?  Given that the easiest way to do this is via a machine-coded generation of the command line or QMP command, it doesn't seem that hard to just keep things as they are with explicit opt-in to per-socket keepalive.
>>
> 
> Hmm. Looking at the code, I remember that reconnect is not optional, it works by default now. The option we have is "reconnect-delay" which only specify, after how much seconds we'll automatically fail the requests, not retrying them (0 seconds by default). Still, NBD tries to reconnect in background anyway.
> 
> In our downstream we have now old version of nbd-reconnect interface and enabled non-zero "delay" by default.
> 


And now we need to migrate to upstream code. Hmm. So I have several options:

1. Set a downstream default for reconnect-delay to be something like 5min. [most simple thing to do]
2. Set an upstream default to 5min. [needs a discussion, but may be useful]
3. Force all users (not customers I mean, but other teams (and even one another company)) to set reconnect-delay option. [just for completeness, I will not go this way]


So, what do you think about [2]? This includes:

  - non-zero reconnect-delay by default, so requests will wait some time for reconnect and retry
  - enabled keep-alive and some keep-alive default parameters, to not hang in recvmsg for a long time


Some other related ideas:

  - non-blocking nbd_open
    - move open to the coroutine
    - use non-blocking socket connect (for example use qio_channel_socket_connect_async, which runs connect in thread). Currently, if you try to blockdev-add some unavailable nbd host, vm hangs during connect() call which may be about a minute
  - make blockdev-add to be async qmp command (Kevin's series)
  - allow reconnect on open

also, recently I noted, that bdrv_close may hang due to reconnect: it does bdrv_flush, which waits for NBD to reconnect.. This seems not convenient, probably we should disable reconnect before bdrv_drained_begin() in bdrv_close().

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-10 19:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 19:15 [PATCH 0/2] keepalive default Vladimir Sementsov-Ogievskiy
2020-07-08 19:15 ` [PATCH 1/2] sockets: keep-alive settings Vladimir Sementsov-Ogievskiy
2020-07-09  8:33   ` Daniel P. Berrangé
2020-07-08 19:15 ` [PATCH 2/2] util/qemu-sockets: make keep-alive enabled by default Vladimir Sementsov-Ogievskiy
2020-07-09  8:29   ` Daniel P. Berrangé
2020-07-09  8:49     ` Vladimir Sementsov-Ogievskiy
2020-07-09 11:52       ` Daniel P. Berrangé
2020-07-09  8:54     ` Denis V. Lunev
2020-07-09 11:40       ` Markus Armbruster
2020-07-08 19:41 ` [PATCH 0/2] keepalive default no-reply
2020-07-09  8:35 ` Daniel P. Berrangé
2020-07-09 15:34   ` Eric Blake
2020-07-09 17:14     ` Vladimir Sementsov-Ogievskiy
2020-07-10 19:25       ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.