On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Reconnect will be implemented in the following commit, so for now,
> in semantics below, disconnect itself is a "serious error".
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json | 12 +++++++++++-
>  block/nbd-client.h   |  1 +
>  block/nbd-client.c   |  1 +
>  block/nbd.c          | 16 +++++++++++++++-
>  4 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5b9084a394..cf03402ec5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3511,13 +3511,23 @@
>  #                  traditional "base:allocation" block status (see
>  #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>  #
> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect

Maybe 'On unexpected disconnect', since intentional disconnect is not
unexpected.

> +#                   again, until success or serious error. During first
> +#                   @reconnect-delay seconds of reconnecting loop all requests
> +#                   are paused and have a chance to rerun, if successful
> +#                   connect occures during this time. After @reconnect-delay

occurs

> +#                   seconds all delayed requests are failed and all following
> +#                   requests will be failed to (until successfull reconnect).

successful

> +#                   Default 300 seconds (Since 3.1)

My delay in reviewing means this now has to be 4.0.

I'm guessing that a delay of 0 means disable auto-reconnect.  From a
backwards-compatibility standpoint, no auto-reconnect is more in line
with what we previously had - but from a usability standpoint, trying to
reconnect can avoid turning transient network hiccups into permanent
loss of a device to EIO errors, especially if the retry timeout is long
enough to allow an administrator to reroute the network to an
alternative server.  So I'm probably okay with the default being
non-zero - but it DOES mean that where you used to get instant EIO
failures when a network connection was severed, you now have to wait for
the reconnect delay to expire, and 5 minutes can be a long wait.  Since
the long delay is guest-observable, can we run into issues where a guest
that is currently used to instant EIO and total loss of the device could
instead get confused by not getting any response for up to 5 minutes,
whether or not that response eventually turns out to be EIO or a
successful recovery?

> +++ b/block/nbd.c
> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>              .help = "experimental: expose named dirty bitmap in place of "
>                      "block status",
>          },
> +        {
> +            .name = "reconnect-delay",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
> +                    "connect again, until success or serious error. During"
> +                    "first @reconnect-delay seconds of reconnecting loop all"
> +                    "requests are paused and have a chance to rerun, if"
> +                    "successful connect occures during this time. After"
> +                    "@reconnect-delay seconds all delayed requests are failed"
> +                    "and all following requests will be failed to (until"
> +                    "successfull reconnect). Default 300 seconds",

Same typos as in qapi.

The UI aspects look fine, now I need to review the patch series for code
issues :)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org