All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] nbd reconnect on open
@ 2021-09-06 19:06 Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp Vladimir Sementsov-Ogievskiy
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

Hi all!

After a long delay here is v3.

v3 is rebased on top of big refactoring of nbd connection code, and on
top of last portion of it, not yet merged:
Based-on: <20210902103805.25686-1-vsementsov@virtuozzo.com>
   "[PATCH v6 0/5] block/nbd: drop connection_co"

So, the core patch (02) is changed a lot. QAPI interface added.

Vladimir Sementsov-Ogievskiy (9):
  nbd/client-connection: nbd_co_establish_connection(): fix non set errp
  qapi: make blockdev-add a coroutine command
  nbd: allow reconnect on open, with corresponding new options
  nbd/client-connection: nbd_co_establish_connection(): return real
    error
  nbd/client-connection: improve error message of cancelled attempt
  iotests.py: add qemu_tool_popen()
  iotests.py: add and use qemu_io_wrap_args()
  iotests.py: add qemu_io_popen()
  iotests: add nbd-reconnect-on-open test

 qapi/block-core.json                          | 12 +++-
 block/nbd.c                                   | 45 +++++++++++-
 nbd/client-connection.c                       | 56 +++++++++++----
 tests/qemu-iotests/iotests.py                 | 39 ++++++----
 .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++++++++++++++++++
 .../tests/nbd-reconnect-on-open.out           | 11 +++
 6 files changed, 203 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
 create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

-- 
2.29.2



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-07 17:44   ` Eric Blake
  2021-09-06 19:06 ` [PATCH v3 2/9] qapi: make blockdev-add a coroutine command Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

When we don't have a connection and blocking is false, we return NULL
but don't set errp. That's wrong.

We have two paths for calling nbd_co_establish_connection():

1. nbd_open() -> nbd_do_establish_connection() -> ...
  but that will never set blocking=false

2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
  but that uses errp=NULL

So, we are safe with our wrong errp policy in
nbd_co_establish_connection(). Still let's fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client-connection.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 7123b1e189..695f855754 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -318,6 +318,7 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
         }
 
         if (!blocking) {
+            error_setg(errp, "No connection at the moment");
             return NULL;
         }
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/9] qapi: make blockdev-add a coroutine command
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-06 19:28   ` Markus Armbruster
  2021-09-06 19:06 ` [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

We are going to support nbd reconnect on open in a next commit. This
means that we want to do several connection attempts during some time.
And this should be done in a coroutine, otherwise we'll stuck.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06674c25c9..6e4042530a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4219,7 +4219,8 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
+{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
+  'coroutine': true }
 
 ##
 # @blockdev-reopen:
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 2/9] qapi: make blockdev-add a coroutine command Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-07 20:36   ` Eric Blake
  2021-09-06 19:06 ` [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

It is useful when start of vm and start of nbd server are not
simple to sync.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json |  9 ++++++++-
 block/nbd.c          | 45 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6e4042530a..30d491bcd4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3994,6 +3994,12 @@
 #                   future requests before a successful reconnect will
 #                   immediately fail. Default 0 (Since 4.2)
 #
+# @open-timeout: In seconds. If zero, the nbd driver tries the connection
+#                only once, and fails to open if the connection fails.
+#                If non-zero, the nbd driver will repeat connection attempts
+#                until successful or until @open-timeout seconds have elapsed.
+#                Default 0 (Since 6.2)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsNbd',
@@ -4001,7 +4007,8 @@
             '*export': 'str',
             '*tls-creds': 'str',
             '*x-dirty-bitmap': 'str',
-            '*reconnect-delay': 'uint32' } }
+            '*reconnect-delay': 'uint32',
+            '*open-timeout': 'uint32' } }
 
 ##
 # @BlockdevOptionsRaw:
diff --git a/block/nbd.c b/block/nbd.c
index 306b2de9f2..38a503102c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -80,6 +80,7 @@ typedef struct BDRVNBDState {
     NBDClientState state;
 
     QEMUTimer *reconnect_delay_timer;
+    QEMUTimer *open_timer;
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
@@ -87,6 +88,7 @@ typedef struct BDRVNBDState {
 
     /* Connection parameters */
     uint32_t reconnect_delay;
+    uint32_t open_timeout;
     SocketAddress *saddr;
     char *export, *tlscredsid;
     QCryptoTLSCreds *tlscreds;
@@ -218,6 +220,32 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     s->state = NBD_CLIENT_QUIT;
 }
 
+static void open_timer_del(BDRVNBDState *s)
+{
+    if (s->open_timer) {
+        timer_free(s->open_timer);
+        s->open_timer = NULL;
+    }
+}
+
+static void open_timer_cb(void *opaque)
+{
+    BDRVNBDState *s = opaque;
+
+    nbd_co_establish_connection_cancel(s->conn);
+    open_timer_del(s);
+}
+
+static void open_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
+{
+    assert(!s->open_timer);
+    s->open_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
+                                  QEMU_CLOCK_REALTIME,
+                                  SCALE_NS,
+                                  open_timer_cb, s);
+    timer_mod(s->open_timer, expire_time_ns);
+}
+
 static bool nbd_client_connecting(BDRVNBDState *s)
 {
     NBDClientState state = qatomic_load_acquire(&s->state);
@@ -1737,6 +1765,15 @@ static QemuOptsList nbd_runtime_opts = {
                     "future requests before a successful reconnect will "
                     "immediately fail. Default 0",
         },
+        {
+            .name = "open-timeout",
+            .type = QEMU_OPT_NUMBER,
+            .help = "In seconds. If zero, the nbd driver tries the connection "
+                    "only once, and fails to open if the connection fails. "
+                    "If non-zero, the nbd driver will repeat connection "
+                    "attempts until successful or until @open-timeout seconds "
+                    "have elapsed. Default 0",
+        },
         { /* end of list */ }
     },
 };
@@ -1792,6 +1829,7 @@ static int nbd_process_options(BlockDriverState *bs, QDict *options,
     }
 
     s->reconnect_delay = qemu_opt_get_number(opts, "reconnect-delay", 0);
+    s->open_timeout = qemu_opt_get_number(opts, "open-timeout", 0);
 
     ret = 0;
 
@@ -1823,7 +1861,12 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     s->conn = nbd_client_connection_new(s->saddr, true, s->export,
                                         s->x_dirty_bitmap, s->tlscreds);
 
-    /* TODO: Configurable retry-until-timeout behaviour. */
+    if (s->open_timeout) {
+        nbd_client_connection_enable_retry(s->conn);
+        open_timer_init(s, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+                        s->open_timeout * NANOSECONDS_PER_SECOND);
+    }
+
     s->state = NBD_CLIENT_CONNECTING_WAIT;
     ret = nbd_do_establish_connection(bs, errp);
     if (ret < 0) {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-07 20:42   ` Eric Blake
  2021-09-06 19:06 ` [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

The only user of errp is call to nbd_do_establish_connection() in
nbd_open(). The only way to cancel this call is through open_timer
timeout. And for this case, user will be more interested in description
of last failed connect rather than in
"Connection attempt cancelled by other operation".

So, let's change behavior on cancel to return previous failure error if
available.

Do the same for non-blocking failure case. In this case we still don't
have a caller that is interested in errp. But let's be consistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client-connection.c | 50 ++++++++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 695f855754..722998c985 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -39,16 +39,18 @@ struct NBDClientConnection {
 
     QemuMutex mutex;
 
+    NBDExportInfo updated_info;
     /*
-     * @sioc and @err represent a connection attempt.  While running
-     * is true, they are only used by the connection thread, and mutex
-     * locking is not needed.  Once the thread finishes,
-     * nbd_co_establish_connection then steals these pointers while
-     * under the mutex.
+     * @sioc represents a successful result. While thread is running, @sioc is
+     * used only by thread and not protected by mutex. When thread is not
+     * running, @sioc is stolen by nbd_co_establish_connection() under mutex.
      */
-    NBDExportInfo updated_info;
     QIOChannelSocket *sioc;
     QIOChannel *ioc;
+    /*
+     * @err represents previous attempt. It may be copied by
+     * nbd_co_establish_connection() when it reports failure.
+     */
     Error *err;
 
     /* All further fields are accessed only under mutex */
@@ -170,18 +172,18 @@ static void *connect_thread_func(void *opaque)
 
     qemu_mutex_lock(&conn->mutex);
     while (!conn->detached) {
+        Error *local_err = NULL;
+
         assert(!conn->sioc);
         conn->sioc = qio_channel_socket_new();
 
         qemu_mutex_unlock(&conn->mutex);
 
-        error_free(conn->err);
-        conn->err = NULL;
         conn->updated_info = conn->initial_info;
 
         ret = nbd_connect(conn->sioc, conn->saddr,
                           conn->do_negotiation ? &conn->updated_info : NULL,
-                          conn->tlscreds, &conn->ioc, &conn->err);
+                          conn->tlscreds, &conn->ioc, &local_err);
 
         /*
          * conn->updated_info will finally be returned to the user. Clear the
@@ -194,6 +196,10 @@ static void *connect_thread_func(void *opaque)
 
         qemu_mutex_lock(&conn->mutex);
 
+        error_free(conn->err);
+        conn->err = NULL;
+        error_propagate(&conn->err, local_err);
+
         if (ret < 0) {
             object_unref(OBJECT(conn->sioc));
             conn->sioc = NULL;
@@ -311,14 +317,17 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
             }
 
             conn->running = true;
-            error_free(conn->err);
-            conn->err = NULL;
             qemu_thread_create(&thread, "nbd-connect",
                                connect_thread_func, conn, QEMU_THREAD_DETACHED);
         }
 
         if (!blocking) {
-            error_setg(errp, "No connection at the moment");
+            if (conn->err) {
+                error_propagate(errp, error_copy(conn->err));
+            } else {
+                error_setg(errp, "No connection at the moment");
+            }
+
             return NULL;
         }
 
@@ -339,14 +348,23 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
              * attempt as failed, but leave the connection thread running,
              * to reuse it for the next connection attempt.
              */
-            error_setg(errp, "Connection attempt cancelled by other operation");
+            if (conn->err) {
+                error_propagate(errp, error_copy(conn->err));
+            } else {
+                error_setg(errp,
+                           "Connection attempt cancelled by other operation");
+            }
+
             return NULL;
         } else {
-            error_propagate(errp, conn->err);
-            conn->err = NULL;
-            if (!conn->sioc) {
+            /* Thread finished. There must be either error or sioc */
+            assert(!conn->err != !conn->sioc);
+
+            if (conn->err) {
+                error_propagate(errp, error_copy(conn->err));
                 return NULL;
             }
+
             if (conn->do_negotiation) {
                 memcpy(info, &conn->updated_info, sizeof(*info));
                 if (conn->ioc) {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-07 20:45   ` Eric Blake
  2021-09-06 19:06 ` [PATCH v3 6/9] iotests.py: add qemu_tool_popen() Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client-connection.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 722998c985..2bda42641d 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -351,8 +351,15 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
             if (conn->err) {
                 error_propagate(errp, error_copy(conn->err));
             } else {
-                error_setg(errp,
-                           "Connection attempt cancelled by other operation");
+                /*
+                 * The only possible case here is cancelling by open_timer
+                 * during nbd_open(). So, the error message is for that case.
+                 * If we have more use cases, we can refactor
+                 * nbd_co_establish_connection_cancel() to take an additional
+                 * parameter cancel_reason, that would be passed than to the
+                 * caller of cancelled nbd_co_establish_connection().
+                 */
+                error_setg(errp, "Connection attempt cancelled by timeout");
             }
 
             return NULL;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/9] iotests.py: add qemu_tool_popen()
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 7/9] iotests.py: add and use qemu_io_wrap_args() Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

Split qemu_tool_popen() from qemu_tool_pipe_and_status() to be used
separately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..b518545c09 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -106,14 +106,21 @@ def unarchive_sample_image(sample, fname):
         shutil.copyfileobj(f_in, f_out)
 
 
+def qemu_tool_popen(tool: str, args: Sequence[str],
+                    connect_stderr: bool = True) -> subprocess.Popen:
+    stderr = subprocess.STDOUT if connect_stderr else None
+    return subprocess.Popen(args,
+                            stdout=subprocess.PIPE,
+                            stderr=stderr,
+                            universal_newlines=True)
+
+
 def qemu_tool_pipe_and_status(tool: str, args: Sequence[str],
                               connect_stderr: bool = True) -> Tuple[str, int]:
     """
     Run a tool and return both its output and its exit code
     """
-    stderr = subprocess.STDOUT if connect_stderr else None
-    with subprocess.Popen(args, stdout=subprocess.PIPE,
-                          stderr=stderr, universal_newlines=True) as subp:
+    with qemu_tool_popen(tool, args, connect_stderr) as subp:
         output = subp.communicate()[0]
         if subp.returncode < 0:
             cmd = ' '.join(args)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 7/9] iotests.py: add and use qemu_io_wrap_args()
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 6/9] iotests.py: add qemu_tool_popen() Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 8/9] iotests.py: add qemu_io_popen() Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

For qemu_io* functions support --image-opts argument, which conflicts
with -f argument from qemu_io_args.

For QemuIoInteractive use new wrapper as well, which allows relying on
default format.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b518545c09..3b7b57489a 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -208,10 +208,17 @@ def img_info_log(filename, filter_path=None, imgopts=False, extra_args=()):
         filter_path = filename
     log(filter_img_info(output, filter_path))
 
+
+def qemu_io_wrap_args(args: Sequence[str]):
+    if '-f' in args or '--image-opts' in args:
+        return qemu_io_args_no_fmt + list(args)
+    else:
+        return qemu_io_args + list(args)
+
+
 def qemu_io(*args):
     '''Run qemu-io and return the stdout data'''
-    args = qemu_io_args + list(args)
-    return qemu_tool_pipe_and_status('qemu-io', args)[0]
+    return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
 
 def qemu_io_log(*args):
     result = qemu_io(*args)
@@ -220,12 +227,7 @@ def qemu_io_log(*args):
 
 def qemu_io_silent(*args):
     '''Run qemu-io and return the exit code, suppressing stdout'''
-    if '-f' in args or '--image-opts' in args:
-        default_args = qemu_io_args_no_fmt
-    else:
-        default_args = qemu_io_args
-
-    args = default_args + list(args)
+    args = qemu_io_wrap_args(args)
     exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
     if exitcode < 0:
         sys.stderr.write('qemu-io received signal %i: %s\n' %
@@ -234,14 +236,14 @@ def qemu_io_silent(*args):
 
 def qemu_io_silent_check(*args):
     '''Run qemu-io and return the true if subprocess returned 0'''
-    args = qemu_io_args + list(args)
+    args = qemu_io_wrap_args(args)
     exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'),
                                stderr=subprocess.STDOUT)
     return exitcode == 0
 
 class QemuIoInteractive:
     def __init__(self, *args):
-        self.args = qemu_io_args_no_fmt + list(args)
+        self.args = qemu_io_wrap_args(args)
         # We need to keep the Popen objext around, and not
         # close it immediately. Therefore, disable the pylint check:
         # pylint: disable=consider-using-with
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 8/9] iotests.py: add qemu_io_popen()
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 7/9] iotests.py: add and use qemu_io_wrap_args() Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-06 19:06 ` [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

Add qemu-io Popen constructor wrapper. To be used in the following new
test commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3b7b57489a..be53c8d5ec 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -216,6 +216,10 @@ def qemu_io_wrap_args(args: Sequence[str]):
         return qemu_io_args + list(args)
 
 
+def qemu_io_popen(*args):
+    return qemu_tool_popen('qemu-io', qemu_io_wrap_args(args))
+
+
 def qemu_io(*args):
     '''Run qemu-io and return the stdout data'''
     return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test
  2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2021-09-06 19:06 ` [PATCH v3 8/9] iotests.py: add qemu_io_popen() Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:06 ` Vladimir Sementsov-Ogievskiy
  2021-09-07 20:51   ` Eric Blake
  8 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 19:06 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, armbru, hreitz, kwolf, vsementsov, eblake, den

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++++++++++++++++++
 .../tests/nbd-reconnect-on-open.out           | 11 +++
 2 files changed, 82 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
 create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open b/tests/qemu-iotests/tests/nbd-reconnect-on-open
new file mode 100755
index 0000000000..7ee9bce947
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+#
+# Test nbd reconnect on open
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+
+import iotests
+from iotests import qemu_img_create, file_path, qemu_io_popen, qemu_nbd, \
+    qemu_io_log, log
+
+iotests.script_initialize(supported_fmts=['qcow2'])
+
+disk, nbd_sock = file_path('disk', 'nbd-sock')
+
+
+def create_args(open_timeout):
+    return ['--image-opts', '-c', 'read 0 1M',
+            f'driver=nbd,open-timeout={open_timeout},'
+            f'server.type=unix,server.path={nbd_sock}']
+
+
+def check_fail_to_connect(open_timeout):
+    log(f'Check fail to connect with {open_timeout} seconds of timeout')
+
+    start_t = time.time()
+    qemu_io_log(*create_args(open_timeout))
+    delta_t = time.time() - start_t
+
+    max_delta = open_timeout + 0.2
+    if open_timeout <= delta_t <= max_delta:
+        log(f'qemu_io finished in {open_timeout}..{max_delta} seconds, OK')
+    else:
+        note = 'too early' if delta_t < open_timeout else 'too long'
+        log(f'qemu_io finished in {delta_t:.1f} seconds, {note}')
+
+
+qemu_img_create('-f', iotests.imgfmt, disk, '1M')
+
+# Start NBD client when NBD server is not yet running. It should not fail, but
+# wait for 5 seconds for the server to be available.
+client = qemu_io_popen(*create_args(5))
+
+time.sleep(1)
+qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, disk)
+
+# client should succeed
+log(client.communicate()[0], filters=[iotests.filter_qemu_io])
+
+# Server was started without --persistent flag, so it should be off now. Let's
+# check it and it the same time check that with open-timeout=0 client fails
+# immediately.
+check_fail_to_connect(0)
+
+# Check that we will fail after non-zero timeout if server is still unavailable
+check_fail_to_connect(1)
diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open.out b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
new file mode 100644
index 0000000000..a35ae30ea4
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
@@ -0,0 +1,11 @@
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check fail to connect with 0 seconds of timeout
+qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such file or directory
+
+qemu_io finished in 0..0.2 seconds, OK
+Check fail to connect with 1 seconds of timeout
+qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such file or directory
+
+qemu_io finished in 1..1.2 seconds, OK
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command
  2021-09-06 19:06 ` [PATCH v3 2/9] qapi: make blockdev-add a coroutine command Vladimir Sementsov-Ogievskiy
@ 2021-09-06 19:28   ` Markus Armbruster
  2021-09-06 21:40     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2021-09-06 19:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, hreitz, den, eblake

Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:

> We are going to support nbd reconnect on open in a next commit. This
> means that we want to do several connection attempts during some time.
> And this should be done in a coroutine, otherwise we'll stuck.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 06674c25c9..6e4042530a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4219,7 +4219,8 @@
>  # <- { "return": {} }
>  #
>  ##
> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
> +  'coroutine': true }
>  
>  ##
>  # @blockdev-reopen:

Why is this safe?

Prior discusson:
Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command
  2021-09-06 19:28   ` Markus Armbruster
@ 2021-09-06 21:40     ` Vladimir Sementsov-Ogievskiy
  2021-09-07  6:14       ` Markus Armbruster
  0 siblings, 1 reply; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-06 21:40 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, den

06.09.2021 22:28, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
> 
>> We are going to support nbd reconnect on open in a next commit. This
>> means that we want to do several connection attempts during some time.
>> And this should be done in a coroutine, otherwise we'll stuck.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 06674c25c9..6e4042530a 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -4219,7 +4219,8 @@
>>   # <- { "return": {} }
>>   #
>>   ##
>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>> +  'coroutine': true }
>>   
>>   ##
>>   # @blockdev-reopen:
> 
> Why is this safe?
> 
> Prior discusson:
> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
> 

Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..

On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.

And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command
  2021-09-06 21:40     ` Vladimir Sementsov-Ogievskiy
@ 2021-09-07  6:14       ` Markus Armbruster
  2021-09-27 11:25         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2021-09-07  6:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, hreitz, den, eblake

Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:

> 06.09.2021 22:28, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
>> 
>>> We are going to support nbd reconnect on open in a next commit. This
>>> means that we want to do several connection attempts during some time.
>>> And this should be done in a coroutine, otherwise we'll stuck.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 06674c25c9..6e4042530a 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -4219,7 +4219,8 @@
>>>   # <- { "return": {} }
>>>   #
>>>   ##
>>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>>> +  'coroutine': true }
>>>   
>>>   ##
>>>   # @blockdev-reopen:
>> 
>> Why is this safe?
>> 
>> Prior discusson:
>> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
>> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
>> 
>
> Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..

Breaking existing block drivers is more serious than adding new drivers
broken.

> On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.
>
> And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?

If it's already used in ways that require coroutine-compatibility, we'd
merely add another way for existing bugs to bite.  Kevin, is it?

In general, the less coroutine-incompatibility we have, the better.
From the thread I quoted:

    Kevin Wolf <kwolf@redhat.com> writes:

    > AM 22.01.2020 um 13:15 hat Markus Armbruster geschrieben:
    [...]
    >> Is coroutine-incompatible command handler code necessary or accidental?
    >> 
    >> By "necessary" I mean there are (and likely always will be) commands
    >> that need to do stuff that cannot or should not be done on coroutine
    >> context.
    >> 
    >> "Accidental" is the opposite: coroutine-incompatibility can be regarded
    >> as a fixable flaw.
    >
    > I expect it's mostly accidental, but maybe not all of it.

I'm inclinded to regard accidental coroutine-incompatibility as a bug.

As long as the command doesn't have the coroutine flag set, it's a
latent bug.

Setting the coroutine flag without auditing the code risks making such
latent bugs actual bugs.  A typical outcome is deadlock.

Whether we're willing to accept such risk is not for me to decide.

We weren't when we merged the coroutine work, at least not wholesale.
The risk is perhaps less scary for a single command.  On the other hand,
code audit is less daunting, too.

Kevin, what do you think?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp
  2021-09-06 19:06 ` [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp Vladimir Sementsov-Ogievskiy
@ 2021-09-07 17:44   ` Eric Blake
  2021-09-24 19:19     ` Eric Blake
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Blake @ 2021-09-07 17:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Mon, Sep 06, 2021 at 10:06:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> When we don't have a connection and blocking is false, we return NULL
> but don't set errp. That's wrong.

Oops...

> 
> We have two paths for calling nbd_co_establish_connection():
> 
> 1. nbd_open() -> nbd_do_establish_connection() -> ...
>   but that will never set blocking=false
> 
> 2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
>   but that uses errp=NULL
> 
> So, we are safe with our wrong errp policy in
> nbd_co_establish_connection(). Still let's fix it.

...phew!  Thus, it's not critical to backport.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client-connection.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index 7123b1e189..695f855754 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -318,6 +318,7 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
>          }
>  
>          if (!blocking) {
> +            error_setg(errp, "No connection at the moment");
>              return NULL;
>          }
>  
> -- 
> 2.29.2
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options
  2021-09-06 19:06 ` [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options Vladimir Sementsov-Ogievskiy
@ 2021-09-07 20:36   ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2021-09-07 20:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Mon, Sep 06, 2021 at 10:06:48PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> It is useful when start of vm and start of nbd server are not
> simple to sync.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json |  9 ++++++++-
>  block/nbd.c          | 45 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 52 insertions(+), 2 deletions(-)
>

Reviewed-by: Eric Blake <eblake@redhat.com>

This is when qemu is acting as NBD client (and does not affect qemu
acting as NBD server).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error
  2021-09-06 19:06 ` [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error Vladimir Sementsov-Ogievskiy
@ 2021-09-07 20:42   ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2021-09-07 20:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Mon, Sep 06, 2021 at 10:06:49PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The only user of errp is call to nbd_do_establish_connection() in
> nbd_open(). The only way to cancel this call is through open_timer

The only caller of nbd_do_establish_connection() that uses errp is
nbd_open().

> timeout. And for this case, user will be more interested in description
> of last failed connect rather than in
> "Connection attempt cancelled by other operation".
> 
> So, let's change behavior on cancel to return previous failure error if
> available.
> 
> Do the same for non-blocking failure case. In this case we still don't
> have a caller that is interested in errp. But let's be consistent.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client-connection.c | 50 ++++++++++++++++++++++++++++-------------
>  1 file changed, 34 insertions(+), 16 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt
  2021-09-06 19:06 ` [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt Vladimir Sementsov-Ogievskiy
@ 2021-09-07 20:45   ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2021-09-07 20:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Mon, Sep 06, 2021 at 10:06:50PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client-connection.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index 722998c985..2bda42641d 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -351,8 +351,15 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
>              if (conn->err) {
>                  error_propagate(errp, error_copy(conn->err));
>              } else {
> -                error_setg(errp,
> -                           "Connection attempt cancelled by other operation");
> +                /*
> +                 * The only possible case here is cancelling by open_timer
> +                 * during nbd_open(). So, the error message is for that case.
> +                 * If we have more use cases, we can refactor
> +                 * nbd_co_establish_connection_cancel() to take an additional
> +                 * parameter cancel_reason, that would be passed than to the
> +                 * caller of cancelled nbd_co_establish_connection().
> +                 */
> +                error_setg(errp, "Connection attempt cancelled by timeout");
>              }
>  
>              return NULL;
> -- 
> 2.29.2
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test
  2021-09-06 19:06 ` [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test Vladimir Sementsov-Ogievskiy
@ 2021-09-07 20:51   ` Eric Blake
  2021-09-08  7:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Blake @ 2021-09-07 20:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Mon, Sep 06, 2021 at 10:06:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  .../qemu-iotests/tests/nbd-reconnect-on-open  | 71 +++++++++++++++++++
>  .../tests/nbd-reconnect-on-open.out           | 11 +++
>  2 files changed, 82 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/nbd-reconnect-on-open
>  create mode 100644 tests/qemu-iotests/tests/nbd-reconnect-on-open.out

I'm less confident in my review of the python code, but...

> 
> diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open b/tests/qemu-iotests/tests/nbd-reconnect-on-open
> new file mode 100755
> index 0000000000..7ee9bce947
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open
> @@ -0,0 +1,71 @@

> +
> +def create_args(open_timeout):
> +    return ['--image-opts', '-c', 'read 0 1M',
> +            f'driver=nbd,open-timeout={open_timeout},'
> +            f'server.type=unix,server.path={nbd_sock}']
> +
> +
> +def check_fail_to_connect(open_timeout):
> +    log(f'Check fail to connect with {open_timeout} seconds of timeout')
> +
> +    start_t = time.time()
> +    qemu_io_log(*create_args(open_timeout))
> +    delta_t = time.time() - start_t
> +
> +    max_delta = open_timeout + 0.2

Is this fractional delay going to bite us on heavily-loaded CI machines?

> +    if open_timeout <= delta_t <= max_delta:
> +        log(f'qemu_io finished in {open_timeout}..{max_delta} seconds, OK')
> +    else:
> +        note = 'too early' if delta_t < open_timeout else 'too long'
> +        log(f'qemu_io finished in {delta_t:.1f} seconds, {note}')
> +
> +
> +qemu_img_create('-f', iotests.imgfmt, disk, '1M')
> +
> +# Start NBD client when NBD server is not yet running. It should not fail, but
> +# wait for 5 seconds for the server to be available.
> +client = qemu_io_popen(*create_args(5))
> +
> +time.sleep(1)
> +qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, disk)
> +
> +# client should succeed
> +log(client.communicate()[0], filters=[iotests.filter_qemu_io])
> +
> +# Server was started without --persistent flag, so it should be off now. Let's
> +# check it and it the same time check that with open-timeout=0 client fails

check it and at

> +# immediately.
> +check_fail_to_connect(0)
> +
> +# Check that we will fail after non-zero timeout if server is still unavailable
> +check_fail_to_connect(1)
> diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open.out b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
> new file mode 100644
> index 0000000000..a35ae30ea4
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open.out
> @@ -0,0 +1,11 @@
> +read 1048576/1048576 bytes at offset 0
> +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +Check fail to connect with 0 seconds of timeout
> +qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such file or directory
> +
> +qemu_io finished in 0..0.2 seconds, OK
> +Check fail to connect with 1 seconds of timeout
> +qemu-io: can't open: Failed to connect to 'TEST_DIR/PID-nbd-sock': No such file or directory
> +
> +qemu_io finished in 1..1.2 seconds, OK

Overall, the test looks like a nice demonstration of the feature: you
are showing that the client can now start before the server, and that
the retry for N seconds is handled gracefully both by the creation of
the server and by the expiration of the retry timeout.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test
  2021-09-07 20:51   ` Eric Blake
@ 2021-09-08  7:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-08  7:55 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, qemu-devel, armbru, hreitz, kwolf, den

07.09.2021 23:51, Eric Blake wrote:
>> +def check_fail_to_connect(open_timeout):
>> +    log(f'Check fail to connect with {open_timeout} seconds of timeout')
>> +
>> +    start_t = time.time()
>> +    qemu_io_log(*create_args(open_timeout))
>> +    delta_t = time.time() - start_t
>> +
>> +    max_delta = open_timeout + 0.2
> Is this fractional delay going to bite us on heavily-loaded CI machines?
> 

You mean that it's too small and may be racy? Hmm. But increasing it now means wasting extra time.. I'd adjust it when CI fail if it happens.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp
  2021-09-07 17:44   ` Eric Blake
@ 2021-09-24 19:19     ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2021-09-24 19:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-block, qemu-devel, armbru, hreitz, den

On Tue, Sep 07, 2021 at 12:44:53PM -0500, Eric Blake wrote:
> On Mon, Sep 06, 2021 at 10:06:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > When we don't have a connection and blocking is false, we return NULL
> > but don't set errp. That's wrong.
> 
> Oops...
> 
> > 
> > We have two paths for calling nbd_co_establish_connection():
> > 
> > 1. nbd_open() -> nbd_do_establish_connection() -> ...
> >   but that will never set blocking=false
> > 
> > 2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
> >   but that uses errp=NULL
> > 
> > So, we are safe with our wrong errp policy in
> > nbd_co_establish_connection(). Still let's fix it.
> 
> ...phew!  Thus, it's not critical to backport.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

Queuing this one through my NBD tree.  Given the discussion on 2/9,
I'll leave the rest of the series for later.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/9] qapi: make blockdev-add a coroutine command
  2021-09-07  6:14       ` Markus Armbruster
@ 2021-09-27 11:25         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 21+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-27 11:25 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, den

07.09.2021 09:14, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
> 
>> 06.09.2021 22:28, Markus Armbruster wrote:
>>> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> writes:
>>>
>>>> We are going to support nbd reconnect on open in a next commit. This
>>>> means that we want to do several connection attempts during some time.
>>>> And this should be done in a coroutine, otherwise we'll stuck.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    qapi/block-core.json | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 06674c25c9..6e4042530a 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -4219,7 +4219,8 @@
>>>>    # <- { "return": {} }
>>>>    #
>>>>    ##
>>>> -{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true }
>>>> +{ 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true,
>>>> +  'coroutine': true }
>>>>    
>>>>    ##
>>>>    # @blockdev-reopen:
>>>
>>> Why is this safe?
>>>
>>> Prior discusson:
>>> Message-ID: <87lfq0yp9v.fsf@dusky.pond.sub.org>
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg04921.html
>>>
>>
>> Hmm.. I'm afraid, that I can't prove that it's safe. At least it will mean to audit .bdrv_open() of all block drivers.. And nothing prevents creating new incompatible drivers in future..
> 
> Breaking existing block drivers is more serious than adding new drivers
> broken.
> 
>> On the other hand, looking at qmp_blockdev_add, bdrv_open() is the only thing of interest.
>>
>> And theoretically, bdrv_open() should work in coroutine context. We do call this function from coroutine_fn functions sometimes. So, maybe, if in some circumstances, bdrv_open() is not compatible with coroutine context, we can consider it as a bug? And fix it later, if it happen?
> 
> If it's already used in ways that require coroutine-compatibility, we'd
> merely add another way for existing bugs to bite.  Kevin, is it?
> 
> In general, the less coroutine-incompatibility we have, the better.
>  From the thread I quoted:
> 
>      Kevin Wolf <kwolf@redhat.com> writes:
> 
>      > AM 22.01.2020 um 13:15 hat Markus Armbruster geschrieben:
>      [...]
>      >> Is coroutine-incompatible command handler code necessary or accidental?
>      >>
>      >> By "necessary" I mean there are (and likely always will be) commands
>      >> that need to do stuff that cannot or should not be done on coroutine
>      >> context.
>      >>
>      >> "Accidental" is the opposite: coroutine-incompatibility can be regarded
>      >> as a fixable flaw.
>      >
>      > I expect it's mostly accidental, but maybe not all of it.
> 
> I'm inclinded to regard accidental coroutine-incompatibility as a bug.
> 
> As long as the command doesn't have the coroutine flag set, it's a
> latent bug.
> 
> Setting the coroutine flag without auditing the code risks making such
> latent bugs actual bugs.  A typical outcome is deadlock.
> 
> Whether we're willing to accept such risk is not for me to decide.
> 
> We weren't when we merged the coroutine work, at least not wholesale.
> The risk is perhaps less scary for a single command.  On the other hand,
> code audit is less daunting, too.
> 
> Kevin, what do you think?
> 


Any thoughts?

I think, we can simply proceed without this patch. That just means that blockdev-add remains blocking, and using it to add nbd node with long open-timeout when guest is running [*] will be inconvenient (we don't want to block the running guest). Still commandline interface, and running blockdev-add when guest is paused is OK.

Anyway, this case [*] is not super useful: OK, guest isn't blocked, but you can't issue more qmp commands until open-timeout passed. That's not very convenient for running vm.

Side thought: don't we have/plan async qmp commands or something like this? So, that command is started in a coroutine, but user don't have to wait for finish to run more QMP commands? It should be more useful for command that may take long time. We have jobs, but implementing new job for such command seems too heavy.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-09-27 11:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 19:06 [PATCH v3 0/9] nbd reconnect on open Vladimir Sementsov-Ogievskiy
2021-09-06 19:06 ` [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp Vladimir Sementsov-Ogievskiy
2021-09-07 17:44   ` Eric Blake
2021-09-24 19:19     ` Eric Blake
2021-09-06 19:06 ` [PATCH v3 2/9] qapi: make blockdev-add a coroutine command Vladimir Sementsov-Ogievskiy
2021-09-06 19:28   ` Markus Armbruster
2021-09-06 21:40     ` Vladimir Sementsov-Ogievskiy
2021-09-07  6:14       ` Markus Armbruster
2021-09-27 11:25         ` Vladimir Sementsov-Ogievskiy
2021-09-06 19:06 ` [PATCH v3 3/9] nbd: allow reconnect on open, with corresponding new options Vladimir Sementsov-Ogievskiy
2021-09-07 20:36   ` Eric Blake
2021-09-06 19:06 ` [PATCH v3 4/9] nbd/client-connection: nbd_co_establish_connection(): return real error Vladimir Sementsov-Ogievskiy
2021-09-07 20:42   ` Eric Blake
2021-09-06 19:06 ` [PATCH v3 5/9] nbd/client-connection: improve error message of cancelled attempt Vladimir Sementsov-Ogievskiy
2021-09-07 20:45   ` Eric Blake
2021-09-06 19:06 ` [PATCH v3 6/9] iotests.py: add qemu_tool_popen() Vladimir Sementsov-Ogievskiy
2021-09-06 19:06 ` [PATCH v3 7/9] iotests.py: add and use qemu_io_wrap_args() Vladimir Sementsov-Ogievskiy
2021-09-06 19:06 ` [PATCH v3 8/9] iotests.py: add qemu_io_popen() Vladimir Sementsov-Ogievskiy
2021-09-06 19:06 ` [PATCH v3 9/9] iotests: add nbd-reconnect-on-open test Vladimir Sementsov-Ogievskiy
2021-09-07 20:51   ` Eric Blake
2021-09-08  7:55     ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.