All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port
@ 2017-06-23 10:31 Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen Knut Omang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Knut Omang @ 2017-06-23 10:31 UTC (permalink / raw)
  To: Daniel P . Berrange, Gerd Hoffmann, Paolo Bonzini; +Cc: qemu-devel, Knut Omang

This series contains:
* a unit test that exposes a race condition which causes QEMU to fail
  to find a port even when there is plenty of available ports.
* a refactor of the qemu-sockets inet_listen_saddr() function
  to better handle this situation.

Changes from v3:
* Test changes: Add missing license
  Add subtests for ipv4, ipv6 and both
  Various g_* usage improvements
* Split patch into 3 patches with two refactoring patches ahead
  of the actual fix.

Changes from v2:
* Non-trivial rebase + further abstraction
  on top of 7ad9af343c7f1c70c8015c7c519c312d8c5f9fa1
  'tests: add functional test validating ipv4/ipv6 address flag handling'

Changes from v1:
* Fix potential uninitialized variable only detected by optimize.
* Improve unexpected error detection in test-listen to give more
  details about why the test fails unexpectedly.
* Fix some line length style issues.

Thanks,
Knut

Knut Omang (4):
  tests: Add test-listen - a stress test for QEMU socket listen
  sockets: factor out create_fast_reuse_socket
  sockets: factor out a new try_bind() function
  sockets: Handle race condition between binds to the same port

 tests/Makefile.include |   2 +-
 tests/test-listen.c    | 253 ++++++++++++++++++++++++++++++++++++++++++-
 util/qemu-sockets.c    | 159 +++++++++++++++++---------
 3 files changed, 362 insertions(+), 52 deletions(-)
 create mode 100644 tests/test-listen.c

base-commit: 7ad9af343c7f1c70c8015c7c519c312d8c5f9fa1
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen
  2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
@ 2017-06-23 10:31 ` Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket Knut Omang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Knut Omang @ 2017-06-23 10:31 UTC (permalink / raw)
  To: Daniel P . Berrange, Gerd Hoffmann, Paolo Bonzini; +Cc: qemu-devel, Knut Omang

There's a potential race condition between multiple bind()'s
attempting to bind to the same port, which occasionally
allows more than one bind to succeed against the same port.

When a subsequent listen() call is made with the same socket
only one will succeed.

The current QEMU code does however not take this situation into account
and the listen will cause the code to break out and fail even
when there are actually available ports to use.

This test exposes two subtests:

/socket/listen-serial
/socket/listen-compete

The "compete" subtest creates a number of threads and have them all trying to bind
to the same port with a large enough offset input to
allow all threads to get it's own port.
The "serial" subtest just does the same, except in series in a
single thread.

The serial version passes, probably in most versions of QEMU.

The parallel version exposes the problem in a relatively reliable way,
eg. it fails a majority of times, but not with a 100% rate, occasional
passes can be seen. Nevertheless this is quite good given that
the bug was tricky to reproduce and has been left undetected for
a while.

The problem seems to be present in all versions of QEMU.

The original failure scenario occurred with VNC port allocation
in a traditional Xen based build, in different code
but with similar functionality.

Reported-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
---
 tests/Makefile.include |   2 +-
 tests/test-listen.c    | 253 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 255 insertions(+)
 create mode 100644 tests/test-listen.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 7180fe4..22bb97e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -127,6 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
 gcov-files-check-bufferiszero-y = util/bufferiszero.c
 check-unit-y += tests/test-uuid$(EXESUF)
 check-unit-y += tests/ptimer-test$(EXESUF)
+#check-unit-y += tests/test-listen$(EXESUF)
 gcov-files-ptimer-test-y = hw/core/ptimer.c
 check-unit-y += tests/test-qapi-util$(EXESUF)
 gcov-files-test-qapi-util-y = qapi/qapi-util.c
@@ -764,6 +765,7 @@ tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
 tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
 tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
 tests/numa-test$(EXESUF): tests/numa-test.o
+tests/test-listen$(EXESUF): tests/test-listen.o $(test-util-obj-y)
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
 	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/test-listen.c b/tests/test-listen.c
new file mode 100644
index 0000000..5c07537
--- /dev/null
+++ b/tests/test-listen.c
@@ -0,0 +1,253 @@
+/*
+ * Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
+ *    Author: Knut Omang <knut.omang@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * Test parallel port listen configuration with
+ * dynamic port allocation
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu-common.h"
+#include "qemu/thread.h"
+#include "qemu/sockets.h"
+#include "qapi/error.h"
+
+#define NAME_LEN 1024
+#define PORT_LEN 16
+
+struct thr_info {
+    QemuThread thread;
+    int to_port;
+    bool ipv4;
+    bool ipv6;
+    int got_port;
+    int eno;
+    int fd;
+    const char *errstr;
+    char hostname[NAME_LEN + 1];
+    char port[PORT_LEN + 1];
+};
+
+
+/* These two functions taken from test-io-channel-socket.c */
+static int check_bind(const char *hostname, bool *has_proto)
+{
+    int fd = -1;
+    struct addrinfo ai, *res = NULL;
+    int rc;
+    int ret = -1;
+
+    memset(&ai, 0, sizeof(ai));
+    ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
+    ai.ai_family = AF_UNSPEC;
+    ai.ai_socktype = SOCK_STREAM;
+
+    /* lookup */
+    rc = getaddrinfo(hostname, NULL, &ai, &res);
+    if (rc != 0) {
+        if (rc == EAI_ADDRFAMILY ||
+            rc == EAI_FAMILY) {
+            *has_proto = false;
+            goto done;
+        }
+        goto cleanup;
+    }
+
+    fd = qemu_socket(res->ai_family, res->ai_socktype, res->ai_protocol);
+    if (fd < 0) {
+        goto cleanup;
+    }
+
+    if (bind(fd, res->ai_addr, res->ai_addrlen) < 0) {
+        if (errno == EADDRNOTAVAIL) {
+            *has_proto = false;
+            goto done;
+        }
+        goto cleanup;
+    }
+
+    *has_proto = true;
+ done:
+    ret = 0;
+
+ cleanup:
+    if (fd != -1) {
+        close(fd);
+    }
+    if (res) {
+        freeaddrinfo(res);
+    }
+    return ret;
+}
+
+static int check_protocol_support(bool *has_ipv4, bool *has_ipv6)
+{
+    if (check_bind("127.0.0.1", has_ipv4) < 0) {
+        return -1;
+    }
+    if (check_bind("::1", has_ipv6) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static void *listener_thread(void *arg)
+{
+    struct thr_info *thr = (struct thr_info *)arg;
+    SocketAddress addr = {
+        .type = SOCKET_ADDRESS_TYPE_INET,
+        .u = {
+            .inet = {
+                .host = thr->hostname,
+                .port = thr->port,
+                .has_ipv4 = thr->ipv4,
+                .ipv4 = thr->ipv4,
+                .has_ipv6 = thr->ipv6,
+                .ipv6 = thr->ipv6,
+                .has_to = true,
+                .to = thr->to_port,
+            },
+        },
+    };
+    Error *err = NULL;
+    int fd;
+
+    fd = socket_listen(&addr, &err);
+    if (fd < 0) {
+        thr->eno = errno;
+        thr->errstr = error_get_pretty(err);
+    } else {
+        struct sockaddr_in a;
+        socklen_t a_len = sizeof(a);
+        g_assert_cmpint(getsockname(fd, (struct sockaddr *)&a, &a_len), ==, 0);
+        thr->got_port = ntohs(a.sin_port);
+        thr->fd = fd;
+    }
+    return arg;
+}
+
+
+static void listen_compete_nthr(bool threaded, int nthreads,
+                                int start_port, int max_offset,
+                                bool ipv4, bool ipv6)
+{
+    int i;
+    int failed_listens = 0;
+    struct thr_info *thr = g_new0(struct thr_info, nthreads);
+    int used[max_offset + 1];
+
+    memset(used, 0, sizeof(used));
+    for (i = 0; i < nthreads; i++) {
+        snprintf(thr[i].port, PORT_LEN, "%d", start_port);
+        strcpy(thr[i].hostname, "localhost");
+        thr[i].to_port = start_port + max_offset;
+        thr[i].ipv4 = ipv4;
+        thr[i].ipv6 = ipv6;
+    }
+
+    for (i = 0; i < nthreads; i++) {
+        if (threaded) {
+            qemu_thread_create(&thr[i].thread, "listener",
+                               listener_thread, &thr[i],
+                               QEMU_THREAD_JOINABLE);
+        } else {
+            listener_thread(&thr[i]);
+        }
+    }
+
+    if (threaded) {
+        for (i = 0; i < nthreads; i++) {
+            qemu_thread_join(&thr[i].thread);
+        }
+    }
+    for (i = 0; i < nthreads; i++) {
+        if (thr[i].got_port) {
+            closesocket(thr[i].fd);
+        }
+    }
+
+    for (i = 0; i < nthreads; i++) {
+        if (thr[i].eno != 0) {
+            const char *m;
+            g_printerr("** Failed to assign a port to thread %d (errno = %d)\n",
+                   i, thr[i].eno);
+            /* This is what we are interested in capturing -
+             * catch and report details if something unexpected happens:
+             */
+            m = strstr(thr[i].errstr, "Failed to listen on socket");
+            if (m != NULL) {
+                g_assert_cmpstr(thr[i].errstr, ==,
+                    "Failed to listen on socket: Address already in use");
+            }
+            failed_listens++;
+        } else {
+            int assigned_port = thr[i].got_port;
+            g_assert_cmpint(assigned_port, <= , thr[i].to_port);
+            g_assert_cmpint(used[assigned_port - start_port], == , 0);
+        }
+    }
+    g_assert_cmpint(failed_listens, ==, 0);
+    g_free(thr);
+}
+
+
+static void listen_compete_ipv4(void)
+{
+    listen_compete_nthr(true, 200, 5920, 300, true, false);
+}
+
+static void listen_serial_ipv4(void)
+{
+    listen_compete_nthr(false, 200, 6300, 300, true, false);
+}
+
+static void listen_compete_ipv6(void)
+{
+    listen_compete_nthr(true, 200, 5920, 300, true, false);
+}
+
+static void listen_serial_ipv6(void)
+{
+    listen_compete_nthr(false, 200, 6300, 300, false, true);
+}
+
+static void listen_compete_gen(void)
+{
+    listen_compete_nthr(true, 200, 5920, 300, true, true);
+}
+
+static void listen_serial_gen(void)
+{
+    listen_compete_nthr(false, 200, 6300, 300, true, true);
+}
+
+
+int main(int argc, char **argv)
+{
+    bool has_ipv4, has_ipv6;
+    g_test_init(&argc, &argv, NULL);
+
+    if (check_protocol_support(&has_ipv4, &has_ipv6) < 0) {
+        return 1;
+    }
+
+    if (has_ipv4) {
+        g_test_add_func("/socket/listen-serial/ipv4", listen_serial_ipv4);
+        g_test_add_func("/socket/listen-compete/ipv4", listen_compete_ipv4);
+    }
+    if (has_ipv6) {
+        g_test_add_func("/socket/listen-serial/ipv6", listen_serial_ipv6);
+        g_test_add_func("/socket/listen-compete/ipv6", listen_compete_ipv6);
+    }
+    if (has_ipv4 && has_ipv6) {
+        g_test_add_func("/socket/listen-serial/generic", listen_serial_gen);
+        g_test_add_func("/socket/listen-compete/generic", listen_compete_gen);
+    }
+    return g_test_run();
+}
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket
  2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen Knut Omang
@ 2017-06-23 10:31 ` Knut Omang
  2017-06-26 10:28   ` Daniel P. Berrange
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 3/4] sockets: factor out a new try_bind() function Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
  3 siblings, 1 reply; 15+ messages in thread
From: Knut Omang @ 2017-06-23 10:31 UTC (permalink / raw)
  To: Daniel P . Berrange, Gerd Hoffmann, Paolo Bonzini; +Cc: qemu-devel, Knut Omang

First refactoring step to prepare for fixing the problem
exposed with the test-listen test in the previous commit

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 util/qemu-sockets.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 852773d..699e36c 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
     return PF_UNSPEC;
 }
 
+static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
+{
+    int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+    if (slisten < 0) {
+        if (!e->ai_next) {
+            error_setg_errno(errp, errno, "Failed to create socket");
+        }
+        return -1;
+    }
+
+    socket_set_fast_reuse(slisten);
+    return slisten;
+}
+
 static int inet_listen_saddr(InetSocketAddress *saddr,
                              int port_offset,
                              bool update_addr,
@@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
         return -1;
     }
 
-    /* create socket + bind */
+    /* create socket + bind/listen */
     for (e = res; e != NULL; e = e->ai_next) {
         getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
 		        uaddr,INET6_ADDRSTRLEN,uport,32,
 		        NI_NUMERICHOST | NI_NUMERICSERV);
-        slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+
+        slisten = create_fast_reuse_socket(e, &err);
         if (slisten < 0) {
-            if (!e->ai_next) {
-                error_setg_errno(errp, errno, "Failed to create socket");
-            }
             continue;
         }
 
-        socket_set_fast_reuse(slisten);
-
         port_min = inet_getport(e);
         port_max = saddr->has_to ? saddr->to + port_offset : port_min;
         for (p = port_min; p <= port_max; p++) {
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v4 3/4] sockets: factor out a new try_bind() function
  2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket Knut Omang
@ 2017-06-23 10:31 ` Knut Omang
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
  3 siblings, 0 replies; 15+ messages in thread
From: Knut Omang @ 2017-06-23 10:31 UTC (permalink / raw)
  To: Daniel P . Berrange, Gerd Hoffmann, Paolo Bonzini; +Cc: qemu-devel, Knut Omang

Another refactoring step to prepare for the problem
exposed by the test-listen test.

This time simplify and reorganize the IPv6 specific extra
measures and move it out of the for loop to increase
code readability. No semantic changes.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
 util/qemu-sockets.c | 69 ++++++++++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 699e36c..48b9319 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -163,6 +163,44 @@ static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
     return slisten;
 }
 
+static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
+{
+#ifndef IPV6_V6ONLY
+    return bind(socket, e->ai_addr, e->ai_addrlen);
+#else
+    /*
+     * Deals with first & last cases in matrix in comment
+     * for inet_ai_family_from_address().
+     */
+    int v6only =
+        ((!saddr->has_ipv4 && !saddr->has_ipv6) ||
+         (saddr->has_ipv4 && saddr->ipv4 &&
+          saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1;
+    int stat;
+
+ rebind:
+    if (e->ai_family == PF_INET6) {
+        qemu_setsockopt(socket, IPPROTO_IPV6, IPV6_V6ONLY, &v6only,
+                        sizeof(v6only));
+    }
+
+    stat = bind(socket, e->ai_addr, e->ai_addrlen);
+    if (!stat) {
+        return 0;
+    }
+
+    /* If we got EADDRINUSE from an IPv6 bind & v6only is unset,
+     * it could be that the IPv4 port is already claimed, so retry
+     * with v6only set
+     */
+    if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) {
+        v6only = 1;
+        goto rebind;
+    }
+    return stat;
+#endif
+}
+
 static int inet_listen_saddr(InetSocketAddress *saddr,
                              int port_offset,
                              bool update_addr,
@@ -238,39 +276,10 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
         port_min = inet_getport(e);
         port_max = saddr->has_to ? saddr->to + port_offset : port_min;
         for (p = port_min; p <= port_max; p++) {
-#ifdef IPV6_V6ONLY
-            /*
-             * Deals with first & last cases in matrix in comment
-             * for inet_ai_family_from_address().
-             */
-            int v6only =
-                ((!saddr->has_ipv4 && !saddr->has_ipv6) ||
-                 (saddr->has_ipv4 && saddr->ipv4 &&
-                  saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1;
-#endif
             inet_setport(e, p);
-#ifdef IPV6_V6ONLY
-        rebind:
-            if (e->ai_family == PF_INET6) {
-                qemu_setsockopt(slisten, IPPROTO_IPV6, IPV6_V6ONLY, &v6only,
-                                sizeof(v6only));
-            }
-#endif
-            if (bind(slisten, e->ai_addr, e->ai_addrlen) == 0) {
+            if (try_bind(slisten, saddr, e) >= 0) {
                 goto listen;
             }
-
-#ifdef IPV6_V6ONLY
-            /* If we got EADDRINUSE from an IPv6 bind & V6ONLY is unset,
-             * it could be that the IPv4 port is already claimed, so retry
-             * with V6ONLY set
-             */
-            if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) {
-                v6only = 1;
-                goto rebind;
-            }
-#endif
-
             if (p == port_max) {
                 if (!e->ai_next) {
                     error_setg_errno(errp, errno, "Failed to bind socket");
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
                   ` (2 preceding siblings ...)
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 3/4] sockets: factor out a new try_bind() function Knut Omang
@ 2017-06-23 10:31 ` Knut Omang
  2017-06-26 10:22   ` Daniel P. Berrange
  2017-06-26 10:34   ` Daniel P. Berrange
  3 siblings, 2 replies; 15+ messages in thread
From: Knut Omang @ 2017-06-23 10:31 UTC (permalink / raw)
  To: Daniel P . Berrange, Gerd Hoffmann, Paolo Bonzini; +Cc: qemu-devel, Knut Omang

If an offset of ports is specified to the inet_listen_saddr function(),
and two or more processes tries to bind from these ports at the same time,
occasionally more than one process may be able to bind to the same
port. The condition is detected by listen() but too late to avoid a failure.

This function is called by socket_listen() and used
by all socket listening code in QEMU, so all cases where any form of dynamic
port selection is used should be subject to this issue.

Add code to close and re-establish the socket when this
condition is observed, hiding the race condition from the user.

This has been developed and tested by means of the
test-listen unit test in the previous commit.
Enable the test for make check now that it passes.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
---
 tests/Makefile.include |  2 +-
 util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 22bb97e..c38f94e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
 gcov-files-check-bufferiszero-y = util/bufferiszero.c
 check-unit-y += tests/test-uuid$(EXESUF)
 check-unit-y += tests/ptimer-test$(EXESUF)
-#check-unit-y += tests/test-listen$(EXESUF)
+check-unit-y += tests/test-listen$(EXESUF)
 gcov-files-ptimer-test-y = hw/core/ptimer.c
 check-unit-y += tests/test-qapi-util$(EXESUF)
 gcov-files-test-qapi-util-y = qapi/qapi-util.c
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 48b9319..7b118b4 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
 #endif
 }
 
+static int try_bind_listen(int *socket, InetSocketAddress *saddr,
+                           struct addrinfo *e, int port, Error **errp)
+{
+    int s = *socket;
+    int ret;
+
+    inet_setport(e, port);
+    ret = try_bind(s, saddr, e);
+    if (ret) {
+        if (errno != EADDRINUSE) {
+            error_setg_errno(errp, errno, "Failed to bind socket");
+        }
+        return errno;
+    }
+    if (listen(s, 1) == 0) {
+            return 0;
+    }
+    if (errno == EADDRINUSE) {
+        /* We got to bind the socket to a port but someone else managed
+         * to bind to the same port and beat us to listen on it!
+         * Recreate the socket and return EADDRINUSE to preserve the
+         * expected state by the caller:
+         */
+        closesocket(s);
+        s = create_fast_reuse_socket(e, errp);
+        if (s < 0) {
+            return errno;
+        }
+        *socket = s;
+        errno = EADDRINUSE;
+        return errno;
+    }
+    error_setg_errno(errp, errno, "Failed to listen on socket");
+    return errno;
+}
+
 static int inet_listen_saddr(InetSocketAddress *saddr,
                              int port_offset,
                              bool update_addr,
@@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
     char port[33];
     char uaddr[INET6_ADDRSTRLEN+1];
     char uport[33];
-    int slisten, rc, port_min, port_max, p;
+    int rc, port_min, port_max, p;
+    int slisten = 0;
+    int saved_errno = 0;
     Error *err = NULL;
 
     memset(&ai,0, sizeof(ai));
@@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
         port_min = inet_getport(e);
         port_max = saddr->has_to ? saddr->to + port_offset : port_min;
         for (p = port_min; p <= port_max; p++) {
-            inet_setport(e, p);
-            if (try_bind(slisten, saddr, e) >= 0) {
-                goto listen;
-            }
-            if (p == port_max) {
-                if (!e->ai_next) {
-                    error_setg_errno(errp, errno, "Failed to bind socket");
-                }
+            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
+            if (!eno) {
+                goto listen_ok;
+            } else if (eno != EADDRINUSE) {
+                goto listen_failed;
             }
         }
+    }
+    error_setg_errno(errp, errno, "Failed to find available port");
+
+listen_failed:
+    saved_errno = errno;
+    if (slisten >= 0) {
         closesocket(slisten);
     }
     freeaddrinfo(res);
+    errno = saved_errno;
     return -1;
 
-listen:
-    if (listen(slisten,1) != 0) {
-        error_setg_errno(errp, errno, "Failed to listen on socket");
-        closesocket(slisten);
-        freeaddrinfo(res);
-        return -1;
-    }
+listen_ok:
     if (update_addr) {
         g_free(saddr->host);
         saddr->host = g_strdup(uaddr);
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
@ 2017-06-26 10:22   ` Daniel P. Berrange
  2017-06-26 12:32     ` Knut Omang
  2017-06-26 10:34   ` Daniel P. Berrange
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrange @ 2017-06-26 10:22 UTC (permalink / raw)
  To: Knut Omang; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> If an offset of ports is specified to the inet_listen_saddr function(),
> and two or more processes tries to bind from these ports at the same time,
> occasionally more than one process may be able to bind to the same
> port. The condition is detected by listen() but too late to avoid a failure.
> 
> This function is called by socket_listen() and used
> by all socket listening code in QEMU, so all cases where any form of dynamic
> port selection is used should be subject to this issue.
> 
> Add code to close and re-establish the socket when this
> condition is observed, hiding the race condition from the user.
> 
> This has been developed and tested by means of the
> test-listen unit test in the previous commit.
> Enable the test for make check now that it passes.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> ---
>  tests/Makefile.include |  2 +-
>  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
>  2 files changed, 53 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 22bb97e..c38f94e 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
>  gcov-files-check-bufferiszero-y = util/bufferiszero.c
>  check-unit-y += tests/test-uuid$(EXESUF)
>  check-unit-y += tests/ptimer-test$(EXESUF)
> -#check-unit-y += tests/test-listen$(EXESUF)
> +check-unit-y += tests/test-listen$(EXESUF)
>  gcov-files-ptimer-test-y = hw/core/ptimer.c
>  check-unit-y += tests/test-qapi-util$(EXESUF)
>  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 48b9319..7b118b4 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
>  #endif
>  }
>  
> +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> +                           struct addrinfo *e, int port, Error **errp)
> +{
> +    int s = *socket;
> +    int ret;
> +
> +    inet_setport(e, port);
> +    ret = try_bind(s, saddr, e);
> +    if (ret) {
> +        if (errno != EADDRINUSE) {
> +            error_setg_errno(errp, errno, "Failed to bind socket");
> +        }
> +        return errno;
> +    }
> +    if (listen(s, 1) == 0) {
> +            return 0;
> +    }
> +    if (errno == EADDRINUSE) {
> +        /* We got to bind the socket to a port but someone else managed
> +         * to bind to the same port and beat us to listen on it!
> +         * Recreate the socket and return EADDRINUSE to preserve the
> +         * expected state by the caller:
> +         */
> +        closesocket(s);
> +        s = create_fast_reuse_socket(e, errp);
> +        if (s < 0) {
> +            return errno;
> +        }
> +        *socket = s;

I don't really like this at all - if we need to close + recreate the
socket, IMHO that should remain the job of the caller, since it owns
the socket FD ultimately.

> +        errno = EADDRINUSE;
> +        return errno;
> +    }
> +    error_setg_errno(errp, errno, "Failed to listen on socket");
> +    return errno;
> +}
> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>                               int port_offset,
>                               bool update_addr,
> @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>      char port[33];
>      char uaddr[INET6_ADDRSTRLEN+1];
>      char uport[33];
> -    int slisten, rc, port_min, port_max, p;
> +    int rc, port_min, port_max, p;
> +    int slisten = 0;
> +    int saved_errno = 0;
>      Error *err = NULL;
>  
>      memset(&ai,0, sizeof(ai));
> @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,

Just above this line is the original 'create_fast_reuse_socket' call.

I'd suggest that we push that call down into the body of the loop
below:

>          port_min = inet_getport(e);
>          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
>          for (p = port_min; p <= port_max; p++) {
> -            inet_setport(e, p);
> -            if (try_bind(slisten, saddr, e) >= 0) {
> -                goto listen;
> -            }
> -            if (p == port_max) {
> -                if (!e->ai_next) {
> -                    error_setg_errno(errp, errno, "Failed to bind socket");
> -                }
> +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);

Which would mean try_bind_listen no longer needs the magic to close +
recreate the socket.

The only cost of doing this is that you end up closing + recreating the
socket after bind hits EADDRINUSE, as well as after listen() hits it.

I think that's acceptable tradeoff for simpler code, since this is not
a performance critical operation.

> +            if (!eno) {
> +                goto listen_ok;
> +            } else if (eno != EADDRINUSE) {
> +                goto listen_failed;
>              }
>          }
> +    }
> +    error_setg_errno(errp, errno, "Failed to find available port");

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket Knut Omang
@ 2017-06-26 10:28   ` Daniel P. Berrange
  2017-06-26 11:56     ` Knut Omang
  2017-07-02  6:26     ` Knut Omang
  0 siblings, 2 replies; 15+ messages in thread
From: Daniel P. Berrange @ 2017-06-26 10:28 UTC (permalink / raw)
  To: Knut Omang; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Fri, Jun 23, 2017 at 12:31:06PM +0200, Knut Omang wrote:
> First refactoring step to prepare for fixing the problem
> exposed with the test-listen test in the previous commit
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
>  util/qemu-sockets.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 852773d..699e36c 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
>      return PF_UNSPEC;
>  }
>  
> +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
> +{
> +    int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> +    if (slisten < 0) {
> +        if (!e->ai_next) {
> +            error_setg_errno(errp, errno, "Failed to create socket");
> +        }

I think that having this method sometimes report an error message, and
sometimes not report an error message, depending on state of a variable
used by the caller is rather unpleasant. I'd much rather see this
error message reporting remain in the caller.

> +        return -1;
> +    }
> +
> +    socket_set_fast_reuse(slisten);
> +    return slisten;
> +}
> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>                               int port_offset,
>                               bool update_addr,
> @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>          return -1;
>      }
>  
> -    /* create socket + bind */
> +    /* create socket + bind/listen */
>      for (e = res; e != NULL; e = e->ai_next) {
>          getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
>  		        uaddr,INET6_ADDRSTRLEN,uport,32,
>  		        NI_NUMERICHOST | NI_NUMERICSERV);
> -        slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> +
> +        slisten = create_fast_reuse_socket(e, &err);
>          if (slisten < 0) {
> -            if (!e->ai_next) {
> -                error_setg_errno(errp, errno, "Failed to create socket");
> -            }
>              continue;

It isn't shown in this diff context, but at the end of the outer
loop we have

   error_setg_errno(errp, errno, "Failed to find available port");

so IIUC, even this pre-existing code is wrong. If 'e->ai_next' is
NULL, we report an error message here. Then, we continue to the
next loop iteration, which causes use to terminate the loop
entirely. At which point we'll report another error message
over the top of the one we already have. So I think the error
reporting does still need refactoring, but not in the way it
is done here.

>          }
>  
> -        socket_set_fast_reuse(slisten);
> -
>          port_min = inet_getport(e);
>          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
>          for (p = port_min; p <= port_max; p++) {

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
  2017-06-26 10:22   ` Daniel P. Berrange
@ 2017-06-26 10:34   ` Daniel P. Berrange
  2017-07-02  8:15     ` Knut Omang
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrange @ 2017-06-26 10:34 UTC (permalink / raw)
  To: Knut Omang; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> If an offset of ports is specified to the inet_listen_saddr function(),
> and two or more processes tries to bind from these ports at the same time,
> occasionally more than one process may be able to bind to the same
> port. The condition is detected by listen() but too late to avoid a failure.
> 
> This function is called by socket_listen() and used
> by all socket listening code in QEMU, so all cases where any form of dynamic
> port selection is used should be subject to this issue.
> 
> Add code to close and re-establish the socket when this
> condition is observed, hiding the race condition from the user.
> 
> This has been developed and tested by means of the
> test-listen unit test in the previous commit.
> Enable the test for make check now that it passes.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> ---
>  tests/Makefile.include |  2 +-
>  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
>  2 files changed, 53 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 22bb97e..c38f94e 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
>  gcov-files-check-bufferiszero-y = util/bufferiszero.c
>  check-unit-y += tests/test-uuid$(EXESUF)
>  check-unit-y += tests/ptimer-test$(EXESUF)
> -#check-unit-y += tests/test-listen$(EXESUF)
> +check-unit-y += tests/test-listen$(EXESUF)
>  gcov-files-ptimer-test-y = hw/core/ptimer.c
>  check-unit-y += tests/test-qapi-util$(EXESUF)
>  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 48b9319..7b118b4 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
>  #endif
>  }
>  
> +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> +                           struct addrinfo *e, int port, Error **errp)
> +{
> +    int s = *socket;
> +    int ret;
> +
> +    inet_setport(e, port);
> +    ret = try_bind(s, saddr, e);
> +    if (ret) {
> +        if (errno != EADDRINUSE) {
> +            error_setg_errno(errp, errno, "Failed to bind socket");
> +        }
> +        return errno;
> +    }
> +    if (listen(s, 1) == 0) {
> +            return 0;
> +    }
> +    if (errno == EADDRINUSE) {
> +        /* We got to bind the socket to a port but someone else managed
> +         * to bind to the same port and beat us to listen on it!
> +         * Recreate the socket and return EADDRINUSE to preserve the
> +         * expected state by the caller:
> +         */
> +        closesocket(s);
> +        s = create_fast_reuse_socket(e, errp);

This usage scenario for create_fast_reuse_socket() makes its error
reporting behaviour even more wrong. Recall that create_fast_reuse_socket
is reporting an error if e->ai_next is NULL, which is a way of determining
this is the last call to create_fast_reuse_socket in the loop. That
assumption is violated though now that we're calling the method from
inside the inner loop. Even when e->ai_next is NULL, we may be calling
create_fast_reuse_socket many many times due to the port  'to' range.

> +        if (s < 0) {
> +            return errno;
> +        }
> +        *socket = s;
> +        errno = EADDRINUSE;
> +        return errno;
> +    }
> +    error_setg_errno(errp, errno, "Failed to listen on socket");
> +    return errno;
> +}

This method is both preserving the global errno, and returning the
global errno. The caller expects global errno to be preserved, so
I think we can just return '-1' from this method.

> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>                               int port_offset,
>                               bool update_addr,
> @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>      char port[33];
>      char uaddr[INET6_ADDRSTRLEN+1];
>      char uport[33];
> -    int slisten, rc, port_min, port_max, p;
> +    int rc, port_min, port_max, p;
> +    int slisten = 0;
> +    int saved_errno = 0;
>      Error *err = NULL;
>  
>      memset(&ai,0, sizeof(ai));
> @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>          port_min = inet_getport(e);
>          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
>          for (p = port_min; p <= port_max; p++) {
> -            inet_setport(e, p);
> -            if (try_bind(slisten, saddr, e) >= 0) {
> -                goto listen;
> -            }
> -            if (p == port_max) {
> -                if (!e->ai_next) {
> -                    error_setg_errno(errp, errno, "Failed to bind socket");
> -                }
> +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
> +            if (!eno) {
> +                goto listen_ok;
> +            } else if (eno != EADDRINUSE) {
> +                goto listen_failed;
>              }
>          }
> +    }
> +    error_setg_errno(errp, errno, "Failed to find available port");
> +
> +listen_failed:
> +    saved_errno = errno;
> +    if (slisten >= 0) {
>          closesocket(slisten);
>      }
>      freeaddrinfo(res);
> +    errno = saved_errno;
>      return -1;
>  
> -listen:
> -    if (listen(slisten,1) != 0) {
> -        error_setg_errno(errp, errno, "Failed to listen on socket");
> -        closesocket(slisten);
> -        freeaddrinfo(res);
> -        return -1;
> -    }
> +listen_ok:
>      if (update_addr) {
>          g_free(saddr->host);
>          saddr->host = g_strdup(uaddr);
> -- 
> git-series 0.9.1

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket
  2017-06-26 10:28   ` Daniel P. Berrange
@ 2017-06-26 11:56     ` Knut Omang
  2017-06-26 12:00       ` Daniel P. Berrange
  2017-07-02  6:26     ` Knut Omang
  1 sibling, 1 reply; 15+ messages in thread
From: Knut Omang @ 2017-06-26 11:56 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, 2017-06-26 at 11:28 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 23, 2017 at 12:31:06PM +0200, Knut Omang wrote:
> > First refactoring step to prepare for fixing the problem
> > exposed with the test-listen test in the previous commit
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> >  util/qemu-sockets.c | 24 +++++++++++++++++-------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 852773d..699e36c 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
> >      return PF_UNSPEC;
> >  }
> >  
> > +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
> > +{
> > +    int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> > +    if (slisten < 0) {
> > +        if (!e->ai_next) {
> > +            error_setg_errno(errp, errno, "Failed to create socket");
> > +        }
> 
> I think that having this method sometimes report an error message, and
> sometimes not report an error message, depending on state of a variable
> used by the caller is rather unpleasant. I'd much rather see this
> error message reporting remain in the caller.

In principle I agree with you, but I think we do want to keep the details 
of what the failure cause was by also propagating information about the system 
call that failed.

I considered this an acceptable trade-off in the name of performance as well as
readability at the next level. This is a fairly unlikely case that one really does not
have to worry too much about at the next level. Setting an error that does not get used
for that special, unlikely case is not that bad. Doing it for all failures would be 
a lot more unnecessary work.

> 
> > +        return -1;
> > +    }
> > +
> > +    socket_set_fast_reuse(slisten);
> > +    return slisten;
> > +}
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >                               int port_offset,
> >                               bool update_addr,
> > @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >          return -1;
> >      }
> >  
> > -    /* create socket + bind */
> > +    /* create socket + bind/listen */
> >      for (e = res; e != NULL; e = e->ai_next) {
> >          getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
> >  		        uaddr,INET6_ADDRSTRLEN,uport,32,
> >  		        NI_NUMERICHOST | NI_NUMERICSERV);
> > -        slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> > +
> > +        slisten = create_fast_reuse_socket(e, &err);
> >          if (slisten < 0) {
> > -            if (!e->ai_next) {
> > -                error_setg_errno(errp, errno, "Failed to create socket");
> > -            }
> >              continue;
> 
> It isn't shown in this diff context, but at the end of the outer
> loop we have
> 
>    error_setg_errno(errp, errno, "Failed to find available port");
> 
> so IIUC, even this pre-existing code is wrong. If 'e->ai_next' is
> NULL, we report an error message here. Then, we continue to the
> next loop iteration, which causes use to terminate the loop
> entirely. At which point we'll report another error message
> over the top of the one we already have. 
>
> So I think the error
> reporting does still need refactoring, but not in the way it
> is done here.

I agree, a simple way to solve it would be to only set errp if no error has already been
set.

Thanks,
Knut

> >          }
> >  
> > -        socket_set_fast_reuse(slisten);
> > -
> >          port_min = inet_getport(e);
> >          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >          for (p = port_min; p <= port_max; p++) {
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket
  2017-06-26 11:56     ` Knut Omang
@ 2017-06-26 12:00       ` Daniel P. Berrange
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel P. Berrange @ 2017-06-26 12:00 UTC (permalink / raw)
  To: Knut Omang; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, Jun 26, 2017 at 01:56:28PM +0200, Knut Omang wrote:
> On Mon, 2017-06-26 at 11:28 +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 23, 2017 at 12:31:06PM +0200, Knut Omang wrote:
> > > First refactoring step to prepare for fixing the problem
> > > exposed with the test-listen test in the previous commit
> > > 
> > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > ---
> > >  util/qemu-sockets.c | 24 +++++++++++++++++-------
> > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > > index 852773d..699e36c 100644
> > > --- a/util/qemu-sockets.c
> > > +++ b/util/qemu-sockets.c
> > > @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
> > >      return PF_UNSPEC;
> > >  }
> > >  
> > > +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
> > > +{
> > > +    int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> > > +    if (slisten < 0) {
> > > +        if (!e->ai_next) {
> > > +            error_setg_errno(errp, errno, "Failed to create socket");
> > > +        }
> > 
> > I think that having this method sometimes report an error message, and
> > sometimes not report an error message, depending on state of a variable
> > used by the caller is rather unpleasant. I'd much rather see this
> > error message reporting remain in the caller.
> 
> In principle I agree with you, but I think we do want to keep the details 
> of what the failure cause was by also propagating information about the system 
> call that failed.
> 
> I considered this an acceptable trade-off in the name of performance as well as
> readability at the next level. This is a fairly unlikely case that one really does not
> have to worry too much about at the next level. Setting an error that does not get used
> for that special, unlikely case is not that bad. Doing it for all failures would be 
> a lot more unnecessary work.
> 
> > 
> > > +        return -1;
> > > +    }
> > > +
> > > +    socket_set_fast_reuse(slisten);
> > > +    return slisten;
> > > +}
> > > +
> > >  static int inet_listen_saddr(InetSocketAddress *saddr,
> > >                               int port_offset,
> > >                               bool update_addr,
> > > @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> > >          return -1;
> > >      }
> > >  
> > > -    /* create socket + bind */
> > > +    /* create socket + bind/listen */
> > >      for (e = res; e != NULL; e = e->ai_next) {
> > >          getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
> > >  		        uaddr,INET6_ADDRSTRLEN,uport,32,
> > >  		        NI_NUMERICHOST | NI_NUMERICSERV);
> > > -        slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
> > > +
> > > +        slisten = create_fast_reuse_socket(e, &err);
> > >          if (slisten < 0) {
> > > -            if (!e->ai_next) {
> > > -                error_setg_errno(errp, errno, "Failed to create socket");
> > > -            }
> > >              continue;
> > 
> > It isn't shown in this diff context, but at the end of the outer
> > loop we have
> > 
> >    error_setg_errno(errp, errno, "Failed to find available port");
> > 
> > so IIUC, even this pre-existing code is wrong. If 'e->ai_next' is
> > NULL, we report an error message here. Then, we continue to the
> > next loop iteration, which causes use to terminate the loop
> > entirely. At which point we'll report another error message
> > over the top of the one we already have. 
> >
> > So I think the error
> > reporting does still need refactoring, but not in the way it
> > is done here.
> 
> I agree, a simple way to solve it would be to only set errp if no
> error has already been set.

I'd like to see each of the methods set the error unconditionally.
In the inet_socket_listen() method, we can use an then intermediate
Error variable on each iteration of the loop, and propagate to
the global variable at the end or discard it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-26 10:22   ` Daniel P. Berrange
@ 2017-06-26 12:32     ` Knut Omang
  2017-06-26 12:49       ` Daniel P. Berrange
  0 siblings, 1 reply; 15+ messages in thread
From: Knut Omang @ 2017-06-26 12:32 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, 2017-06-26 at 11:22 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> > If an offset of ports is specified to the inet_listen_saddr function(),
> > and two or more processes tries to bind from these ports at the same time,
> > occasionally more than one process may be able to bind to the same
> > port. The condition is detected by listen() but too late to avoid a failure.
> > 
> > This function is called by socket_listen() and used
> > by all socket listening code in QEMU, so all cases where any form of dynamic
> > port selection is used should be subject to this issue.
> > 
> > Add code to close and re-establish the socket when this
> > condition is observed, hiding the race condition from the user.
> > 
> > This has been developed and tested by means of the
> > test-listen unit test in the previous commit.
> > Enable the test for make check now that it passes.
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> > Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> > ---
> >  tests/Makefile.include |  2 +-
> >  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
> >  2 files changed, 53 insertions(+), 17 deletions(-)
> > 
> > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > index 22bb97e..c38f94e 100644
> > --- a/tests/Makefile.include
> > +++ b/tests/Makefile.include
> > @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
> >  gcov-files-check-bufferiszero-y = util/bufferiszero.c
> >  check-unit-y += tests/test-uuid$(EXESUF)
> >  check-unit-y += tests/ptimer-test$(EXESUF)
> > -#check-unit-y += tests/test-listen$(EXESUF)
> > +check-unit-y += tests/test-listen$(EXESUF)
> >  gcov-files-ptimer-test-y = hw/core/ptimer.c
> >  check-unit-y += tests/test-qapi-util$(EXESUF)
> >  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 48b9319..7b118b4 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct
> addrinfo *e)
> >  #endif
> >  }
> >  
> > +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> > +                           struct addrinfo *e, int port, Error **errp)
> > +{
> > +    int s = *socket;
> > +    int ret;
> > +
> > +    inet_setport(e, port);
> > +    ret = try_bind(s, saddr, e);
> > +    if (ret) {
> > +        if (errno != EADDRINUSE) {
> > +            error_setg_errno(errp, errno, "Failed to bind socket");
> > +        }
> > +        return errno;
> > +    }
> > +    if (listen(s, 1) == 0) {
> > +            return 0;
> > +    }
> > +    if (errno == EADDRINUSE) {
> > +        /* We got to bind the socket to a port but someone else managed
> > +         * to bind to the same port and beat us to listen on it!
> > +         * Recreate the socket and return EADDRINUSE to preserve the
> > +         * expected state by the caller:
> > +         */
> > +        closesocket(s);
> > +        s = create_fast_reuse_socket(e, errp);
> > +        if (s < 0) {
> > +            return errno;
> > +        }
> > +        *socket = s;
> 
> I don't really like this at all - if we need to close + recreate the
> socket, IMHO that should remain the job of the caller, since it owns
> the socket FD ultimately.

Normally I would agree, but this is a very unlikely situation. I considered moving the
complexity out to the caller, even to recreate for every call, but found those solutions
to be inferior as they do not in any way confine the problem, and cause the handling of
the common cases to be much less readable. It's going to be some trade-offs here.

As long as the caller is aware of (by the reference call) that the socket in use may
change, this is in my view a clean (as clean as possible) abstraction that simplifies the
logic at the next level. My intention is to make the common, good case as readable as
possible and hide some of the complexity of these 
unlikely error scenarios inside the new functions - divide and conquer..

> 
> > +        errno = EADDRINUSE;
> > +        return errno;
> > +    }
> > +    error_setg_errno(errp, errno, "Failed to listen on socket");
> > +    return errno;
> > +}
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >                               int port_offset,
> >                               bool update_addr,
> > @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >      char port[33];
> >      char uaddr[INET6_ADDRSTRLEN+1];
> >      char uport[33];
> > -    int slisten, rc, port_min, port_max, p;
> > +    int rc, port_min, port_max, p;
> > +    int slisten = 0;
> > +    int saved_errno = 0;
> >      Error *err = NULL;
> >  
> >      memset(&ai,0, sizeof(ai));
> > @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> 
> Just above this line is the original 'create_fast_reuse_socket' call.
> 
> I'd suggest that we push that call down into the body of the loop
> below:
> 
> >          port_min = inet_getport(e);
> >          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >          for (p = port_min; p <= port_max; p++) {
> > -            inet_setport(e, p);
> > -            if (try_bind(slisten, saddr, e) >= 0) {
> > -                goto listen;
> > -            }
> > -            if (p == port_max) {
> > -                if (!e->ai_next) {
> > -                    error_setg_errno(errp, errno, "Failed to bind socket");
> > -                }
> > +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
> 
> Which would mean try_bind_listen no longer needs the magic to close +
> recreate the socket.
> 
> The only cost of doing this is that you end up closing + recreating the
> socket after bind hits EADDRINUSE, as well as after listen() hits it.

The problem with this approach in my opinion is that one has to understand the
fix for the problem I am trying to solve here in order to read the main code, 
even though this is a very special case. Everyone reading the code would ask themselves
the question 'why do they recreate the socket here?' and then be forced to ready the
details of try_bind_listen anyway, or we would need additional comments.

The idea behind the abstractions I have used here is to hide the details inside functions,
but leave them with an as clean as possible (although not ideal) interface that 
makes the overall logic more readable.

> I think that's acceptable tradeoff for simpler code, since this is not
> a performance critical operation.

Also should we perhaps worry about any side effects of creating and closing a lot of
sockets unnecessary?

Thanks,
Knut

> 
> > +            if (!eno) {
> > +                goto listen_ok;
> > +            } else if (eno != EADDRINUSE) {
> > +                goto listen_failed;
> >              }
> >          }
> > +    }
> > +    error_setg_errno(errp, errno, "Failed to find available port");
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-26 12:32     ` Knut Omang
@ 2017-06-26 12:49       ` Daniel P. Berrange
  2017-07-02  8:17         ` Knut Omang
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrange @ 2017-06-26 12:49 UTC (permalink / raw)
  To: Knut Omang; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, Jun 26, 2017 at 02:32:48PM +0200, Knut Omang wrote:
> On Mon, 2017-06-26 at 11:22 +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> > > If an offset of ports is specified to the inet_listen_saddr function(),
> > > and two or more processes tries to bind from these ports at the same time,
> > > occasionally more than one process may be able to bind to the same
> > > port. The condition is detected by listen() but too late to avoid a failure.
> > > 
> > > This function is called by socket_listen() and used
> > > by all socket listening code in QEMU, so all cases where any form of dynamic
> > > port selection is used should be subject to this issue.
> > > 
> > > Add code to close and re-establish the socket when this
> > > condition is observed, hiding the race condition from the user.
> > > 
> > > This has been developed and tested by means of the
> > > test-listen unit test in the previous commit.
> > > Enable the test for make check now that it passes.
> > > 
> > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> > > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> > > Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> > > ---
> > >  tests/Makefile.include |  2 +-
> > >  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
> > >  2 files changed, 53 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > > index 22bb97e..c38f94e 100644
> > > --- a/tests/Makefile.include
> > > +++ b/tests/Makefile.include
> > > @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
> > >  gcov-files-check-bufferiszero-y = util/bufferiszero.c
> > >  check-unit-y += tests/test-uuid$(EXESUF)
> > >  check-unit-y += tests/ptimer-test$(EXESUF)
> > > -#check-unit-y += tests/test-listen$(EXESUF)
> > > +check-unit-y += tests/test-listen$(EXESUF)
> > >  gcov-files-ptimer-test-y = hw/core/ptimer.c
> > >  check-unit-y += tests/test-qapi-util$(EXESUF)
> > >  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> > > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > > index 48b9319..7b118b4 100644
> > > --- a/util/qemu-sockets.c
> > > +++ b/util/qemu-sockets.c
> > > @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct
> > addrinfo *e)
> > >  #endif
> > >  }
> > >  
> > > +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> > > +                           struct addrinfo *e, int port, Error **errp)
> > > +{
> > > +    int s = *socket;
> > > +    int ret;
> > > +
> > > +    inet_setport(e, port);
> > > +    ret = try_bind(s, saddr, e);
> > > +    if (ret) {
> > > +        if (errno != EADDRINUSE) {
> > > +            error_setg_errno(errp, errno, "Failed to bind socket");
> > > +        }
> > > +        return errno;
> > > +    }
> > > +    if (listen(s, 1) == 0) {
> > > +            return 0;
> > > +    }
> > > +    if (errno == EADDRINUSE) {
> > > +        /* We got to bind the socket to a port but someone else managed
> > > +         * to bind to the same port and beat us to listen on it!
> > > +         * Recreate the socket and return EADDRINUSE to preserve the
> > > +         * expected state by the caller:
> > > +         */
> > > +        closesocket(s);
> > > +        s = create_fast_reuse_socket(e, errp);
> > > +        if (s < 0) {
> > > +            return errno;
> > > +        }
> > > +        *socket = s;
> > 
> > I don't really like this at all - if we need to close + recreate the
> > socket, IMHO that should remain the job of the caller, since it owns
> > the socket FD ultimately.
> 
> Normally I would agree, but this is a very unlikely situation. I considered moving the
> complexity out to the caller, even to recreate for every call, but found those solutions
> to be inferior as they do not in any way confine the problem, and cause the handling of
> the common cases to be much less readable. It's going to be some trade-offs here.
> 
> As long as the caller is aware of (by the reference call) that the socket in use may
> change, this is in my view a clean (as clean as possible) abstraction that simplifies the
> logic at the next level. My intention is to make the common, good case as readable as
> possible and hide some of the complexity of these 
> unlikely error scenarios inside the new functions - divide and conquer..
> 
> > 
> > > +        errno = EADDRINUSE;
> > > +        return errno;
> > > +    }
> > > +    error_setg_errno(errp, errno, "Failed to listen on socket");
> > > +    return errno;
> > > +}
> > > +
> > >  static int inet_listen_saddr(InetSocketAddress *saddr,
> > >                               int port_offset,
> > >                               bool update_addr,
> > > @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> > >      char port[33];
> > >      char uaddr[INET6_ADDRSTRLEN+1];
> > >      char uport[33];
> > > -    int slisten, rc, port_min, port_max, p;
> > > +    int rc, port_min, port_max, p;
> > > +    int slisten = 0;
> > > +    int saved_errno = 0;
> > >      Error *err = NULL;
> > >  
> > >      memset(&ai,0, sizeof(ai));
> > > @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> > 
> > Just above this line is the original 'create_fast_reuse_socket' call.
> > 
> > I'd suggest that we push that call down into the body of the loop
> > below:
> > 
> > >          port_min = inet_getport(e);
> > >          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> > >          for (p = port_min; p <= port_max; p++) {
> > > -            inet_setport(e, p);
> > > -            if (try_bind(slisten, saddr, e) >= 0) {
> > > -                goto listen;
> > > -            }
> > > -            if (p == port_max) {
> > > -                if (!e->ai_next) {
> > > -                    error_setg_errno(errp, errno, "Failed to bind socket");
> > > -                }
> > > +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
> > 
> > Which would mean try_bind_listen no longer needs the magic to close +
> > recreate the socket.
> > 
> > The only cost of doing this is that you end up closing + recreating the
> > socket after bind hits EADDRINUSE, as well as after listen() hits it.
> 
> The problem with this approach in my opinion is that one has to understand the
> fix for the problem I am trying to solve here in order to read the main code, 
> even though this is a very special case. Everyone reading the code would ask themselves
> the question 'why do they recreate the socket here?' and then be forced to ready the
> details of try_bind_listen anyway, or we would need additional comments.

That's easily solved by adding a comment

  /* We recreate the socket FD on each iteration because
     if bind succeeds & listen fails, we can't bind
     again on the same socket FD */

> The idea behind the abstractions I have used here is to hide the details inside functions,
> but leave them with an as clean as possible (although not ideal) interface that 
> makes the overall logic more readable.

I think the result is actually harder to understand, because of the
peculiar way the function closes & reopens the socket FD belonging
to the caller, and the error handling is really very unclear and
buggy as a result too.

> > I think that's acceptable tradeoff for simpler code, since this is not
> > a performance critical operation.
> 
> Also should we perhaps worry about any side effects of creating and closing a lot of
> sockets unnecessary?

What side effects ? I don't think there are any - since this is server
side, not client side, we're not leaving any state around in timed waits
or similar.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket
  2017-06-26 10:28   ` Daniel P. Berrange
  2017-06-26 11:56     ` Knut Omang
@ 2017-07-02  6:26     ` Knut Omang
  1 sibling, 0 replies; 15+ messages in thread
From: Knut Omang @ 2017-07-02  6:26 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, 2017-06-26 at 11:28 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 23, 2017 at 12:31:06PM +0200, Knut Omang wrote:
> > 
> > First refactoring step to prepare for fixing the problem
> > exposed with the test-listen test in the previous commit
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> >  util/qemu-sockets.c | 24 +++++++++++++++++-------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 852773d..699e36c 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress
> > *addr,
> >      return PF_UNSPEC;
> >  }
> >  
> > +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
> > +{
> > +    int slisten = qemu_socket(e->ai_family, e->ai_socktype, e-
> > >ai_protocol);
> > +    if (slisten < 0) {
> > +        if (!e->ai_next) {
> > +            error_setg_errno(errp, errno, "Failed to create socket");
> > +        }
> 
> I think that having this method sometimes report an error message, and
> sometimes not report an error message, depending on state of a variable
> used by the caller is rather unpleasant. I'd much rather see this
> error message reporting remain in the caller.
>
> > 
> > +        return -1;
> > +    }
> > +
> > +    socket_set_fast_reuse(slisten);
> > +    return slisten;
> > +}
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >                               int port_offset,
> >                               bool update_addr,
> > @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >          return -1;
> >      }
> >  
> > -    /* create socket + bind */
> > +    /* create socket + bind/listen */
> >      for (e = res; e != NULL; e = e->ai_next) {
> >          getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
> >  		        uaddr,INET6_ADDRSTRLEN,uport,32,
> >  		        NI_NUMERICHOST | NI_NUMERICSERV);
> > -        slisten = qemu_socket(e->ai_family, e->ai_socktype, e-
> > >ai_protocol);
> > +
> > +        slisten = create_fast_reuse_socket(e, &err);
> >          if (slisten < 0) {
> > -            if (!e->ai_next) {
> > -                error_setg_errno(errp, errno, "Failed to create socket");
> > -            }
> >              continue;
> 
> It isn't shown in this diff context, but at the end of the outer
> loop we have
> 
>    error_setg_errno(errp, errno, "Failed to find available port");
> 
> so IIUC, even this pre-existing code is wrong. If 'e->ai_next' is
> NULL, we report an error message here. Then, we continue to the
> next loop iteration, which causes use to terminate the loop
> entirely. At which point we'll report another error message
> over the top of the one we already have. So I think the error
> reporting does still need refactoring, but not in the way it
> is done here.

Yes, I did scratch my head about this but I tried to keep the original semantics
to avoid mixing unrelated changes.

With the split into separate refactoring commits we are beyond that anyway.

I'll have a second look at it..

Thanks,
Knut

> 
> > 
> >          }
> >  
> > -        socket_set_fast_reuse(slisten);
> > -
> >          port_min = inet_getport(e);
> >          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >          for (p = port_min; p <= port_max; p++) {
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-26 10:34   ` Daniel P. Berrange
@ 2017-07-02  8:15     ` Knut Omang
  0 siblings, 0 replies; 15+ messages in thread
From: Knut Omang @ 2017-07-02  8:15 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, 2017-06-26 at 11:34 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> > 
> > If an offset of ports is specified to the inet_listen_saddr function(),
> > and two or more processes tries to bind from these ports at the same time,
> > occasionally more than one process may be able to bind to the same
> > port. The condition is detected by listen() but too late to avoid a failure.
> > 
> > This function is called by socket_listen() and used
> > by all socket listening code in QEMU, so all cases where any form of dynamic
> > port selection is used should be subject to this issue.
> > 
> > Add code to close and re-establish the socket when this
> > condition is observed, hiding the race condition from the user.
> > 
> > This has been developed and tested by means of the
> > test-listen unit test in the previous commit.
> > Enable the test for make check now that it passes.
> > 
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> > Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> > ---
> >  tests/Makefile.include |  2 +-
> >  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
> >  2 files changed, 53 insertions(+), 17 deletions(-)
> > 
> > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > index 22bb97e..c38f94e 100644
> > --- a/tests/Makefile.include
> > +++ b/tests/Makefile.include
> > @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
> >  gcov-files-check-bufferiszero-y = util/bufferiszero.c
> >  check-unit-y += tests/test-uuid$(EXESUF)
> >  check-unit-y += tests/ptimer-test$(EXESUF)
> > -#check-unit-y += tests/test-listen$(EXESUF)
> > +check-unit-y += tests/test-listen$(EXESUF)
> >  gcov-files-ptimer-test-y = hw/core/ptimer.c
> >  check-unit-y += tests/test-qapi-util$(EXESUF)
> >  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 48b9319..7b118b4 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress
> > *saddr, struct addrinfo *e)
> >  #endif
> >  }
> >  
> > +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> > +                           struct addrinfo *e, int port, Error **errp)
> > +{
> > +    int s = *socket;
> > +    int ret;
> > +
> > +    inet_setport(e, port);
> > +    ret = try_bind(s, saddr, e);
> > +    if (ret) {
> > +        if (errno != EADDRINUSE) {
> > +            error_setg_errno(errp, errno, "Failed to bind socket");
> > +        }
> > +        return errno;
> > +    }
> > +    if (listen(s, 1) == 0) {
> > +            return 0;
> > +    }
> > +    if (errno == EADDRINUSE) {
> > +        /* We got to bind the socket to a port but someone else managed
> > +         * to bind to the same port and beat us to listen on it!
> > +         * Recreate the socket and return EADDRINUSE to preserve the
> > +         * expected state by the caller:
> > +         */
> > +        closesocket(s);
> > +        s = create_fast_reuse_socket(e, errp);
> 
> This usage scenario for create_fast_reuse_socket() makes its error
> reporting behaviour even more wrong. Recall that create_fast_reuse_socket
> is reporting an error if e->ai_next is NULL, which is a way of determining
> this is the last call to create_fast_reuse_socket in the loop. That
> assumption is violated though now that we're calling the method from
> inside the inner loop. Even when e->ai_next is NULL, we may be calling
> create_fast_reuse_socket many many times due to the port  'to' range.

I agree that the error reporting should go out of create_fast_reuse_socket().
Note however that this code will only be called when the race condition occurs,
which I think is very unlikely to happen more than once for each call to
inet_listen_saddr (except in my test of course..)

> 
> > 
> > +        if (s < 0) {
> > +            return errno;
> > +        }
> > +        *socket = s;
> > +        errno = EADDRINUSE;
> > +        return errno;
> > +    }
> > +    error_setg_errno(errp, errno, "Failed to listen on socket");
> > +    return errno;
> > +}
> 
> This method is both preserving the global errno, and returning the
> global errno. The caller expects global errno to be preserved, so
> I think we can just return '-1' from this method.

will do,

Thanks,
Knut

> 
> > 
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >                               int port_offset,
> >                               bool update_addr,
> > @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >      char port[33];
> >      char uaddr[INET6_ADDRSTRLEN+1];
> >      char uport[33];
> > -    int slisten, rc, port_min, port_max, p;
> > +    int rc, port_min, port_max, p;
> > +    int slisten = 0;
> > +    int saved_errno = 0;
> >      Error *err = NULL;
> >  
> >      memset(&ai,0, sizeof(ai));
> > @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >          port_min = inet_getport(e);
> >          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >          for (p = port_min; p <= port_max; p++) {
> > -            inet_setport(e, p);
> > -            if (try_bind(slisten, saddr, e) >= 0) {
> > -                goto listen;
> > -            }
> > -            if (p == port_max) {
> > -                if (!e->ai_next) {
> > -                    error_setg_errno(errp, errno, "Failed to bind socket");
> > -                }
> > +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
> > +            if (!eno) {
> > +                goto listen_ok;
> > +            } else if (eno != EADDRINUSE) {
> > +                goto listen_failed;
> >              }
> >          }
> > +    }
> > +    error_setg_errno(errp, errno, "Failed to find available port");
> > +
> > +listen_failed:
> > +    saved_errno = errno;
> > +    if (slisten >= 0) {
> >          closesocket(slisten);
> >      }
> >      freeaddrinfo(res);
> > +    errno = saved_errno;
> >      return -1;
> >  
> > -listen:
> > -    if (listen(slisten,1) != 0) {
> > -        error_setg_errno(errp, errno, "Failed to listen on socket");
> > -        closesocket(slisten);
> > -        freeaddrinfo(res);
> > -        return -1;
> > -    }
> > +listen_ok:
> >      if (update_addr) {
> >          g_free(saddr->host);
> >          saddr->host = g_strdup(uaddr);
> > -- 
> > git-series 0.9.1
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
  2017-06-26 12:49       ` Daniel P. Berrange
@ 2017-07-02  8:17         ` Knut Omang
  0 siblings, 0 replies; 15+ messages in thread
From: Knut Omang @ 2017-07-02  8:17 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Gerd Hoffmann, Paolo Bonzini, qemu-devel

On Mon, 2017-06-26 at 13:49 +0100, Daniel P. Berrange wrote:
> On Mon, Jun 26, 2017 at 02:32:48PM +0200, Knut Omang wrote:
> > 
> > On Mon, 2017-06-26 at 11:22 +0100, Daniel P. Berrange wrote:
> > > 
> > > On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> > > > 
> > > > If an offset of ports is specified to the inet_listen_saddr function(),
> > > > and two or more processes tries to bind from these ports at the same
> > > > time,
> > > > occasionally more than one process may be able to bind to the same
> > > > port. The condition is detected by listen() but too late to avoid a
> > > > failure.
> > > >  
> > > > This function is called by socket_listen() and used
> > > > by all socket listening code in QEMU, so all cases where any form of
> > > > dynamic
> > > > port selection is used should be subject to this issue.
> > > >  
> > > > Add code to close and re-establish the socket when this
> > > > condition is observed, hiding the race condition from the user.
> > > >  
> > > > This has been developed and tested by means of the
> > > > test-listen unit test in the previous commit.
> > > > Enable the test for make check now that it passes.
> > > >  
> > > > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > > > Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> > > > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> > > > Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> > > > ---
> > > >   tests/Makefile.include |  2 +-
> > > >   util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++--------
> > > > ---
> > > >   2 files changed, 53 insertions(+), 17 deletions(-)
> > > >  
> > > > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > > > index 22bb97e..c38f94e 100644
> > > > --- a/tests/Makefile.include
> > > > +++ b/tests/Makefile.include
> > > > @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
> > > >   gcov-files-check-bufferiszero-y = util/bufferiszero.c
> > > >   check-unit-y += tests/test-uuid$(EXESUF)
> > > >   check-unit-y += tests/ptimer-test$(EXESUF)
> > > > -#check-unit-y += tests/test-listen$(EXESUF)
> > > > +check-unit-y += tests/test-listen$(EXESUF)
> > > >   gcov-files-ptimer-test-y = hw/core/ptimer.c
> > > >   check-unit-y += tests/test-qapi-util$(EXESUF)
> > > >   gcov-files-test-qapi-util-y = qapi/qapi-util.c
> > > > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > > > index 48b9319..7b118b4 100644
> > > > --- a/util/qemu-sockets.c
> > > > +++ b/util/qemu-sockets.c
> > > > @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress
> > > > *saddr, struct
> > > addrinfo *e)
> > > > 
> > > >   #endif
> > > >   }
> > > >   
> > > > +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> > > > +                           struct addrinfo *e, int port, Error **errp)
> > > > +{
> > > > +    int s = *socket;
> > > > +    int ret;
> > > > +
> > > > +    inet_setport(e, port);
> > > > +    ret = try_bind(s, saddr, e);
> > > > +    if (ret) {
> > > > +        if (errno != EADDRINUSE) {
> > > > +            error_setg_errno(errp, errno, "Failed to bind socket");
> > > > +        }
> > > > +        return errno;
> > > > +    }
> > > > +    if (listen(s, 1) == 0) {
> > > > +            return 0;
> > > > +    }
> > > > +    if (errno == EADDRINUSE) {
> > > > +        /* We got to bind the socket to a port but someone else managed
> > > > +         * to bind to the same port and beat us to listen on it!
> > > > +         * Recreate the socket and return EADDRINUSE to preserve the
> > > > +         * expected state by the caller:
> > > > +         */
> > > > +        closesocket(s);
> > > > +        s = create_fast_reuse_socket(e, errp);
> > > > +        if (s < 0) {
> > > > +            return errno;
> > > > +        }
> > > > +        *socket = s;
> > > 
> > > I don't really like this at all - if we need to close + recreate the
> > > socket, IMHO that should remain the job of the caller, since it owns
> > > the socket FD ultimately.
> > 
> > Normally I would agree, but this is a very unlikely situation. I considered
> > moving the
> > complexity out to the caller, even to recreate for every call, but found
> > those solutions
> > to be inferior as they do not in any way confine the problem, and cause the
> > handling of
> > the common cases to be much less readable. It's going to be some trade-offs
> > here.
> > 
> > As long as the caller is aware of (by the reference call) that the socket in
> > use may
> > change, this is in my view a clean (as clean as possible) abstraction that
> > simplifies the
> > logic at the next level. My intention is to make the common, good case as
> > readable as
> > possible and hide some of the complexity of these 
> > unlikely error scenarios inside the new functions - divide and conquer..
> > 
> > > 
> > > 
> > > > 
> > > > +        errno = EADDRINUSE;
> > > > +        return errno;
> > > > +    }
> > > > +    error_setg_errno(errp, errno, "Failed to listen on socket");
> > > > +    return errno;
> > > > +}
> > > > +
> > > >   static int inet_listen_saddr(InetSocketAddress *saddr,
> > > >                                int port_offset,
> > > >                                bool update_addr,
> > > > @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress
> > > > *saddr,
> > > >       char port[33];
> > > >       char uaddr[INET6_ADDRSTRLEN+1];
> > > >       char uport[33];
> > > > -    int slisten, rc, port_min, port_max, p;
> > > > +    int rc, port_min, port_max, p;
> > > > +    int slisten = 0;
> > > > +    int saved_errno = 0;
> > > >       Error *err = NULL;
> > > >   
> > > >       memset(&ai,0, sizeof(ai));
> > > > @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress
> > > > *saddr,
> > > 
> > > Just above this line is the original 'create_fast_reuse_socket' call.
> > > 
> > > I'd suggest that we push that call down into the body of the loop
> > > below:
> > > 
> > > > 
> > > >           port_min = inet_getport(e);
> > > >           port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> > > >           for (p = port_min; p <= port_max; p++) {
> > > > -            inet_setport(e, p);
> > > > -            if (try_bind(slisten, saddr, e) >= 0) {
> > > > -                goto listen;
> > > > -            }
> > > > -            if (p == port_max) {
> > > > -                if (!e->ai_next) {
> > > > -                    error_setg_errno(errp, errno, "Failed to bind
> > > > socket");
> > > > -                }
> > > > +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);
> > > 
> > > Which would mean try_bind_listen no longer needs the magic to close +
> > > recreate the socket.
> > > 
> > > The only cost of doing this is that you end up closing + recreating the
> > > socket after bind hits EADDRINUSE, as well as after listen() hits it.
> > 
> > The problem with this approach in my opinion is that one has to understand
> > the
> > fix for the problem I am trying to solve here in order to read the main
> > code, 
> > even though this is a very special case. Everyone reading the code would ask
> > themselves
> > the question 'why do they recreate the socket here?' and then be forced to
> > ready the
> > details of try_bind_listen anyway, or we would need additional comments.
> 
> That's easily solved by adding a comment
> 
>   /* We recreate the socket FD on each iteration because
>      if bind succeeds & listen fails, we can't bind
>      again on the same socket FD */
> 
> > 
> > The idea behind the abstractions I have used here is to hide the details
> > inside functions,
> > but leave them with an as clean as possible (although not ideal) interface
> > that 
> > makes the overall logic more readable.
> 
> I think the result is actually harder to understand, because of the
> peculiar way the function closes & reopens the socket FD belonging
> to the caller, and the error handling is really very unclear and
> buggy as a result too.

I assume we have sorted out the error reporting issues in the other part of the
thread. Although the reopen of the socket this way is not ideal, I still think
it is the best tradeoff but I'll have a second look at it,

> > 
> > > I think that's acceptable tradeoff for simpler code, since this is not
> > > a performance critical operation.
> > 
> > Also should we perhaps worry about any side effects of creating and closing
> > a lot of
> > sockets unnecessary?
> 
> What side effects ? I don't think there are any - since this is server
> side, not client side, we're not leaving any state around in timed waits
> or similar.

If you say so.
I would not know for all platforms involved without further work
testing/investigating.

If you don't mind I still think it is better to avoid socket recreation for all
cases just to handle this (very rare) condition.

Thanks for the thorough review,

Regards,
Knut

> 
> Regards,
> Daniel
> -- 
> > 
> > : https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> > :|
> > : https://libvirt.org         -o-            https://fstop138.berrange.com
> > :|
> > : https://entangle-photo.org    -o-    https://www.instagram.com/dberrange
> > :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-07-02  8:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket Knut Omang
2017-06-26 10:28   ` Daniel P. Berrange
2017-06-26 11:56     ` Knut Omang
2017-06-26 12:00       ` Daniel P. Berrange
2017-07-02  6:26     ` Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 3/4] sockets: factor out a new try_bind() function Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
2017-06-26 10:22   ` Daniel P. Berrange
2017-06-26 12:32     ` Knut Omang
2017-06-26 12:49       ` Daniel P. Berrange
2017-07-02  8:17         ` Knut Omang
2017-06-26 10:34   ` Daniel P. Berrange
2017-07-02  8:15     ` Knut Omang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.